PROJECT BACKGROUND: There is a huge demand for used cars in the Indian Market today. As sales of new cars have slowed down in the recent past, the pre-owned car market has continued to grow over the past few years and is now larger than the new car market. Cars4U is a budding tech start-up that aims to find footholes in this market.
In 2018-19, while new car sales were recorded at 3.6 million units, around 4 million second-hand cars were bought and sold. There is a slowdown in new car sales and that could mean that the demand is shifting towards the pre-owned market. In fact, some car owners replace their old vehicles with pre-owned cars instead of buying a new automobile.
Unlike new cars, where price and supply are fairly deterministic and managed by OEMs (Original Equipment Manufacturer / except for dealership level discounts which come into play only in the last stage of the customer journey), the used car market is a very different beast, with large uncertainties in both pricing and supply. Several factors, including mileage, brand, model, year, etc. can influence the actual worth of a car. From the perspective of a seller, it is not an easy task to set the correct price of a used car. Keeping this in mind, the pricing scheme of these used cars becomes important in order to grow in the market.
BUSINESS CONTEXT: Cars4U, a tech startup in India, want to sell software to used car dealerships in a B2B business model. They would like to also target a larger market, the end customer in a B2C model where end users could use the price predictor model to negotiate with car sellers (a fremium model with upsell for more functionality; advertisement revenue) - but this is their future market strategy. Firstly, their initial target market will be used car sellers where the value proposition is to provide a more accurate understanding of the price of a used car beyond the traditional make, model, year. Based on this information, the dealership can decide to add a premium on top of the given price to the customer; they can also opt to reject any car that they get from auction if the price is too high. For this, they are using historical data that contains other variables such as number of previous owner, the transmission types, etc. (see Data Dictionary). They intend to use this information to create a machine learning model to first prove their hypothesis that other factors do indeed play a role in the price of a used car. They can then productize the model in the form of an app that can be deployed and that will provide an easy to use interface for query and results. The app itself may have other features such as history of similar cars sold in the region, etc. but first they have to see if their hunch is correct.
Determine an accurate pricing model that will effectively predict the selling price of used cars in order to enable Cars4U to sell software that will allow its customers to devise profitable revenue strategies using differential pricing in the Indian market.
The most important question is related to determining what are the key factors affecting the selling price of a used car. We need to examine every independent variable and make this determination. In the case where domain knowledge is required, we will use Google.
How confident are we in our findings?
Can we justify our assumptions?
Can we justify our findings?
How will we ensure the accuracy of our chosen model?
Create an accuracte and defensible supervised machine learning model that will provide an accurate selling price for a used car based on the features provided in the data dictionary.
S.No. : Serial Number
Name : Name of the car which includes Brand name and Model name
Location : The location in which the car is being sold or is available for purchase (Cities)
Year : Manufacturing year of the car
Kilometers_driven : The total kilometers driven in the car by the previous owner(s) in KM
Fuel_Type : The type of fuel used by the car (Petrol, Diesel, Electric, CNG, LPG)
Transmission : The type of transmission used by the car (Automatic / Manual)
Owner : Type of ownership
Mileage : The standard mileage offered by the car company in kmpl or km/kg
Engine : The displacement volume of the engine in CC
Power : The maximum power of the engine in bhp
Seats : The number of seats in the car
New_Price : The price of a new car of the same model in INR 100,000
Price : The price of the used car in INR 100,000 (Target Variable)
This notebook can be considered a guide to refer to while solving the problem. The evaluation will be as per the Rubric shared for each Milestone. Unlike previous courses, it does not follow the pattern of the graded questions in different sections. This notebook will give you a direction on what steps need to be taken in order to get a viable solution to the problem. Please note that this is just one way of doing this. There can be other 'creative' ways to solve the problem and we urge you to feel free and explore them as an 'optional' exercise.
In the notebook, there are markdown cells called - Observations and Insights. It is a good practice to provide observations and extract insights from the outputs.
The naming convention for different variables can vary. Please consider the code provided in this notebook as a sample code.
All the outputs in the notebook are just for reference and can be different if you follow a different approach.
There are sections called Think About It in the notebook that will help you get a better understanding of the reasoning behind a particular technique/step. Interested learners can take alternative approaches if they wish to explore different techniques.
# Import libraries for data manipulation
import pandas as pd
import numpy as np
# Import libraries for data visualization
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from statsmodels.graphics.gofplots import ProbPlot
# Import libraries for building linear regression model
from statsmodels.formula.api import ols
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
# Import library for preparing data
from sklearn.model_selection import train_test_split
# Import library for data preprocessing
from sklearn.preprocessing import MinMaxScaler
# To ignore warnings
import warnings
warnings.filterwarnings('ignore')
# Remove the limit from the number of displayed columns and rows. It helps to see the entire dataframe while printing it
pd.set_option("display.max_columns", None)
# To get visualization on missing values
#!pip install missingno
import missingno as msno
df = pd.read_csv("used_cars.csv")
df.head()
| S.No. | Name | Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_price | Price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Maruti Wagon R LXI CNG | Mumbai | 2010 | 72000 | CNG | Manual | First | 26.60 | 998.0 | 58.16 | 5.0 | NaN | 1.75 |
| 1 | 1 | Hyundai Creta 1.6 CRDi SX Option | Pune | 2015 | 41000 | Diesel | Manual | First | 19.67 | 1582.0 | 126.20 | 5.0 | NaN | 12.50 |
| 2 | 2 | Honda Jazz V | Chennai | 2011 | 46000 | Petrol | Manual | First | 18.20 | 1199.0 | 88.70 | 5.0 | 8.61 | 4.50 |
| 3 | 3 | Maruti Ertiga VDI | Chennai | 2012 | 87000 | Diesel | Manual | First | 20.77 | 1248.0 | 88.76 | 7.0 | NaN | 6.00 |
| 4 | 4 | Audi A4 New 2.0 TDI Multitronic | Coimbatore | 2013 | 40670 | Diesel | Automatic | Second | 15.20 | 1968.0 | 140.80 | 5.0 | NaN | 17.74 |
df.tail()
| S.No. | Name | Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_price | Price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7248 | 7248 | Volkswagen Vento Diesel Trendline | Hyderabad | 2011 | 89411 | Diesel | Manual | First | 20.54 | 1598.0 | 103.6 | 5.0 | NaN | NaN |
| 7249 | 7249 | Volkswagen Polo GT TSI | Mumbai | 2015 | 59000 | Petrol | Automatic | First | 17.21 | 1197.0 | 103.6 | 5.0 | NaN | NaN |
| 7250 | 7250 | Nissan Micra Diesel XV | Kolkata | 2012 | 28000 | Diesel | Manual | First | 23.08 | 1461.0 | 63.1 | 5.0 | NaN | NaN |
| 7251 | 7251 | Volkswagen Polo GT TSI | Pune | 2013 | 52262 | Petrol | Automatic | Third | 17.20 | 1197.0 | 103.6 | 5.0 | NaN | NaN |
| 7252 | 7252 | Mercedes-Benz E-Class 2009-2013 E 220 CDI Avan... | Kochi | 2014 | 72443 | Diesel | Automatic | First | 10.00 | 2148.0 | 170.0 | 5.0 | NaN | NaN |
Observations and Insights: _
The price of the car indicated by the variable Price and is the target variable. The rest of the variables are independent variables on which we will predict the price of the car. There are NaNs in the database, especially in the New_price and Price columns. I also understand what Owner Type means now: whether this is the first, second, third, etc owner of the car. This is actually valuable information. For example, if a car has changed hands several times during a short period of time, it may indicate a problem with the car. Serial number seems to be a unique identifier for each car, but since we are not looking for indiviual cars but categories, this column may be unnecessary - to be determined.
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7253 entries, 0 to 7252 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 S.No. 7253 non-null int64 1 Name 7253 non-null object 2 Location 7253 non-null object 3 Year 7253 non-null int64 4 Kilometers_Driven 7253 non-null int64 5 Fuel_Type 7253 non-null object 6 Transmission 7253 non-null object 7 Owner_Type 7253 non-null object 8 Mileage 7251 non-null float64 9 Engine 7207 non-null float64 10 Power 7078 non-null float64 11 Seats 7200 non-null float64 12 New_price 1006 non-null float64 13 Price 6019 non-null float64 dtypes: float64(6), int64(3), object(5) memory usage: 793.4+ KB
#count of unique values in features
df.nunique()
S.No. 7253 Name 2041 Location 11 Year 23 Kilometers_Driven 3660 Fuel_Type 5 Transmission 2 Owner_Type 4 Mileage 438 Engine 150 Power 383 Seats 8 New_price 625 Price 1373 dtype: int64
# Check total number of missing values of each column. Hint: Use isnull() method
df.isnull().sum()
S.No. 0 Name 0 Location 0 Year 0 Kilometers_Driven 0 Fuel_Type 0 Transmission 0 Owner_Type 0 Mileage 2 Engine 46 Power 175 Seats 53 New_price 6247 Price 1234 dtype: int64
Observations and Insights: _
We can observe that S.No. has no null values. Also the number of unique values are equal to the number of observations. So, S.No. looks like an index for the data entry and such a column would not be useful in providing any predictive power for our analysis. Hence, it can be dropped. The 2 price columns have nulls.
We can also observe that there are a lot of missing values of New_price. Price is high as well. Engine, Power and Seats also have missing values, but it's not as bad.
# Remove S.No. column from data. Hint: Use inplace = True
df.drop(columns=['S.No.'], inplace=True)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7253 entries, 0 to 7252 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 7253 non-null object 1 Location 7253 non-null object 2 Year 7253 non-null int64 3 Kilometers_Driven 7253 non-null int64 4 Fuel_Type 7253 non-null object 5 Transmission 7253 non-null object 6 Owner_Type 7253 non-null object 7 Mileage 7251 non-null float64 8 Engine 7207 non-null float64 9 Power 7078 non-null float64 10 Seats 7200 non-null float64 11 New_price 1006 non-null float64 12 Price 6019 non-null float64 dtypes: float64(6), int64(2), object(5) memory usage: 736.8+ KB
df.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Year | 7253.0 | 2013.365366 | 3.254421 | 1996.00 | 2011.000 | 2014.00 | 2016.0000 | 2019.00 |
| Kilometers_Driven | 7253.0 | 58699.063146 | 84427.720583 | 171.00 | 34000.000 | 53416.00 | 73000.0000 | 6500000.00 |
| Mileage | 7251.0 | 18.141580 | 4.562197 | 0.00 | 15.170 | 18.16 | 21.1000 | 33.54 |
| Engine | 7207.0 | 1616.573470 | 595.285137 | 72.00 | 1198.000 | 1493.00 | 1968.0000 | 5998.00 |
| Power | 7078.0 | 112.765214 | 53.493553 | 34.20 | 75.000 | 94.00 | 138.1000 | 616.00 |
| Seats | 7200.0 | 5.280417 | 0.809277 | 2.00 | 5.000 | 5.00 | 5.0000 | 10.00 |
| New_price | 1006.0 | 22.779692 | 27.759344 | 3.91 | 7.885 | 11.57 | 26.0425 | 375.00 |
| Price | 6019.0 | 9.479468 | 11.187917 | 0.44 | 3.500 | 5.64 | 9.9500 | 160.00 |
Observations and Insights: _
We can derive the following observations and insights:
For 'Year' variable, the distribution is likely to be roughly symmetric and unimodal, with a peak around 2013-2014 and a long tail towards both sides (older and newer years). This indicates that the majority of cars in the dataset were manufactured around 2013-2014, with relatively fewer cars manufactured in the earlier or later years.
For Kilometers_Driven, the max appears to be an outlier since it is highly unlikely that a vehicle has been driven for 6.5 million kilometers.
For Mileage, the min cannot be 0.
# Explore basic summary statistics of categorical variables. Hint: Use the argument include = ['object']
for i in df.columns:
plt.figure(figsize = (7, 4))
sns.histplot(data = df, x = i, kde = True)
plt.show()
Number of unique observations in each category
cat_cols = df.select_dtypes(include = ['object']).columns
for column in cat_cols:
print("For column:", column)
print(df[column].value_counts())
print('-'*50)
For column: Name
Mahindra XUV500 W8 2WD 55
Maruti Swift VDI 49
Maruti Swift Dzire VDI 42
Honda City 1.5 S MT 39
Maruti Swift VDI BSIV 37
..
Chevrolet Beat LT Option 1
Skoda Rapid 1.6 MPI AT Elegance Plus 1
Ford EcoSport 1.5 TDCi Ambiente 1
Hyundai i10 Magna 1.1 iTech SE 1
Hyundai Elite i20 Magna Plus 1
Name: Name, Length: 2041, dtype: int64
--------------------------------------------------
For column: Location
Mumbai 949
Hyderabad 876
Coimbatore 772
Kochi 772
Pune 765
Delhi 660
Kolkata 654
Chennai 591
Jaipur 499
Bangalore 440
Ahmedabad 275
Name: Location, dtype: int64
--------------------------------------------------
For column: Fuel_Type
Diesel 3852
Petrol 3325
CNG 62
LPG 12
Electric 2
Name: Fuel_Type, dtype: int64
--------------------------------------------------
For column: Transmission
Manual 5204
Automatic 2049
Name: Transmission, dtype: int64
--------------------------------------------------
For column: Owner_Type
First 5952
Second 1152
Third 137
Fourth & Above 12
Name: Owner_Type, dtype: int64
--------------------------------------------------
Think About It:
Let's explore the two points mentioned above
Check Kilometers_Driven extreme values
# Sort the dataset in 'descending' order using the feature 'Kilometers_Driven'
df.sort_values('Kilometers_Driven', ascending=False).head(10)
| Name | Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_price | Price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2328 | BMW X5 xDrive 30d M Sport | Chennai | 2017 | 6500000 | Diesel | Automatic | First | 15.97 | 2993.0 | 258.00 | 5.0 | NaN | 65.00 |
| 340 | Skoda Octavia Ambition Plus 2.0 TDI AT | Kolkata | 2013 | 775000 | Diesel | Automatic | First | 19.30 | 1968.0 | 141.00 | 5.0 | NaN | 7.50 |
| 1860 | Volkswagen Vento Diesel Highline | Chennai | 2013 | 720000 | Diesel | Manual | First | 20.54 | 1598.0 | 103.60 | 5.0 | NaN | 5.90 |
| 358 | Hyundai i10 Magna 1.2 | Chennai | 2009 | 620000 | Petrol | Manual | First | 20.36 | 1197.0 | 78.90 | 5.0 | NaN | 2.70 |
| 2823 | Volkswagen Jetta 2013-2015 2.0L TDI Highline AT | Chennai | 2015 | 480000 | Diesel | Automatic | First | 16.96 | 1968.0 | 138.03 | 5.0 | NaN | 13.00 |
| 3092 | Honda City i VTEC SV | Kolkata | 2015 | 480000 | Petrol | Manual | First | 17.40 | 1497.0 | 117.30 | 5.0 | NaN | 5.00 |
| 4491 | Hyundai i20 Magna Optional 1.2 | Bangalore | 2013 | 445000 | Petrol | Manual | First | 18.50 | 1197.0 | 82.90 | 5.0 | NaN | 4.45 |
| 6921 | Maruti Swift Dzire Tour LDI | Jaipur | 2012 | 350000 | Diesel | Manual | First | 23.40 | 1248.0 | 74.00 | 5.0 | NaN | NaN |
| 3649 | Tata Indigo LS | Jaipur | 2008 | 300000 | Diesel | Manual | First | 17.00 | 1405.0 | 70.00 | 5.0 | NaN | 1.00 |
| 1528 | Toyota Innova 2.5 G (Diesel) 8 Seater BS IV | Hyderabad | 2005 | 299322 | Diesel | Manual | First | 12.80 | 2494.0 | 102.00 | 8.0 | NaN | 4.00 |
In the first row, a car manufactured as recently as 2017 having been driven 6500000 km is almost impossible. It can be considered as data entry error and so we can remove this value/entry from data.
# Removing the 'row' at index 2328 from the data. Hint: use the argument inplace=True
df.drop(index=2328, inplace=True)
Check Mileage extreme values
# Sort the dataset in 'ascending' order using the feature 'Mileage'
df.sort_values('Mileage').head(10)
| Name | Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_price | Price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2597 | Hyundai Santro Xing XP | Pune | 2007 | 70000 | Petrol | Manual | First | 0.0 | 1086.0 | NaN | 5.0 | NaN | 1.12 |
| 2343 | Hyundai Santro AT | Hyderabad | 2006 | 74483 | Petrol | Automatic | First | 0.0 | 999.0 | NaN | 5.0 | NaN | 2.30 |
| 5270 | Honda City 1.5 GXI | Bangalore | 2002 | 53000 | Petrol | Manual | Second | 0.0 | NaN | NaN | NaN | NaN | 1.85 |
| 424 | Volkswagen Jetta 2007-2011 1.9 L TDI | Hyderabad | 2010 | 42021 | Diesel | Manual | First | 0.0 | 1968.0 | NaN | 5.0 | NaN | 5.45 |
| 6857 | Land Rover Freelander 2 TD4 SE | Mumbai | 2011 | 87000 | Diesel | Automatic | First | 0.0 | 2179.0 | 115.0 | 5.0 | NaN | NaN |
| 443 | Hyundai Santro GLS I - Euro I | Coimbatore | 2012 | 50243 | Petrol | Manual | First | 0.0 | 1086.0 | NaN | 5.0 | NaN | 3.35 |
| 5119 | Hyundai Santro Xing XP | Kolkata | 2008 | 45500 | Petrol | Manual | Second | 0.0 | 1086.0 | NaN | 5.0 | NaN | 1.17 |
| 5022 | Land Rover Freelander 2 TD4 SE | Hyderabad | 2013 | 46000 | Diesel | Automatic | Second | 0.0 | 2179.0 | 115.0 | 5.0 | NaN | 26.00 |
| 5016 | Land Rover Freelander 2 TD4 HSE | Delhi | 2013 | 72000 | Diesel | Automatic | First | 0.0 | 2179.0 | 115.0 | 5.0 | NaN | 15.50 |
| 2542 | Hyundai Santro GLS II - Euro II | Bangalore | 2011 | 65000 | Petrol | Manual | Second | 0.0 | NaN | NaN | NaN | NaN | 3.15 |
Univariate analysis is used to explore each variable in a data set, separately. It looks at the range of values, as well as the central tendency of the values. It can be done for both numerical and categorical variables.
Histograms and box plots help to visualize and describe numerical data. We use box plot and histogram to analyse the numerical columns.
# Let us write a function that will help us create a boxplot and histogram for any input numerical variable.
# This function takes the numerical column as the input and returns the boxplots and histograms for the variable.
def histogram_boxplot(feature, figsize = (15, 10), bins = None):
""" Boxplot and histogram combined
feature: 1-d feature array
figsize: size of fig (default (9, 8))
bins: number of bins (default None / auto)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid = 2
sharex = True, # X-axis will be shared among all subplots
gridspec_kw = {"height_ratios": (.25, .75)},
figsize = figsize
) # Creating the 2 subplots
sns.boxplot(feature, ax = ax_box2, showmeans = True, color = 'violet') # Boxplot will be created and a symbol will indicate the mean value of the column
sns.distplot(feature, kde = F, ax = ax_hist2, bins = bins, palette = "winter") if bins else sns.distplot(feature, kde = False, ax = ax_hist2) # For histogram
ax_hist2.axvline(np.mean(feature), color = 'green', linestyle = '--') # Add mean to the histogram
ax_hist2.axvline(np.median(feature), color = 'black', linestyle = '-') # Add median to the histogram
Let us plot histogram and box-plot for the feature 'Kilometers_Driven' to understand the distribution and outliers, if any.
# Plot histogram and box-plot for 'Kilometers_Driven'
histogram_boxplot(df['Kilometers_Driven'])
Think About It: Kilometers_Driven is highly right-skewed. Can we use Log transformation of the feature to reduce/remove the skewness? Why can't we keep skewed data?
Log transformation can be used to reduce the skewness of a feature, in particular when the feature is highly right-skewed. Log transformation can make the distribution more symmetric and closer to a normal distribution. The log transformation can also make the data more interpretable as it can reduce the effect of outliers.
However, the right-skewed distribution can cause issues when building models because the model may be sensitive to the outliers present in the right tail. This can lead to overfitting or poor generalization performance of the model.
Furthermore, when data is skewed, it can also lead to inaccurate estimates of model parameters and a decrease in the power of statistical tests. Also, some algorithms such as linear and logistic regression assume that the data is normally distributed, and therefore, a skewed feature can lead to biased or inefficient estimates of the model parameters.
# Log transformation of the feature 'Kilometers_Driven'
sns.distplot(np.log(df["Kilometers_Driven"]), axlabel = "Log(Kilometers_Driven)");
Observations and Insights: _
This is better, but now shows a slight left-skewedness.
# We can add a transformed kilometers_driven feature in data
df["kilometers_driven_log"] = np.log(df["Kilometers_Driven"])
Note: Like Kilometers_Driven, the distribution of Price is also highly skewed, we can use log transformation on this column to see if that helps normalize the distribution. And add the transformed variable into the dataset. You can name the variable as 'price_log'.
# Plot histogram and box-plot for 'Price'
histogram_boxplot(df['Price'])
# Log transformation of the feature 'Price'
sns.distplot(np.log(df["Price"]), axlabel = "Log(Price)")
<AxesSubplot:xlabel='Log(Price)', ylabel='Density'>
# We can Add a transformed Price feature in data
df["price_log"] = np.log(df["Price"])
Note: Try plotting histogram and box-plot for different numerical features and understand how the data looks like.
# Plot histogram and box-plot for 'Mileage'
histogram_boxplot(df['Mileage'])
# Log transformation of the feature 'Mileage'
#sns.distplot(np.log(df["Mileage"]), axlabel = "Log(Mileage)")
#this code results in error due to zeros in the dataset for Mileage
# Plot histogram and box-plot for 'Engine'
histogram_boxplot(df['Engine'])
# Log transformation of the feature 'Engine'
sns.distplot(np.log(df["Engine"]), axlabel = "Log(Engine)");
# Plot histogram and box-plot for 'Power'
histogram_boxplot(df['Power'])
# Log transformation of the feature 'Power'
sns.distplot(np.log(df["Power"]), axlabel = "Log(Power)");
# Plot histogram and box-plot for 'New_price'
histogram_boxplot(df['New_price'])
# Log transformation of the feature 'New_price'
sns.distplot(np.log(df["New_price"]), axlabel = "Log(New_price)");
# We can Add a transformed New_price feature in data
df["new_price_log"] = np.log(df["New_price"])
# Creating histograms
df.hist(figsize = (14, 14))
plt.show()
df['Year'] = df['Year'].astype('object')
df['Seats'] = df['Seats'].astype('object')
# Creating histograms
df.hist(figsize = (14, 14))
df.info()
plt.show()
<class 'pandas.core.frame.DataFrame'> Int64Index: 7252 entries, 0 to 7252 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 7252 non-null object 1 Location 7252 non-null object 2 Year 7252 non-null object 3 Kilometers_Driven 7252 non-null int64 4 Fuel_Type 7252 non-null object 5 Transmission 7252 non-null object 6 Owner_Type 7252 non-null object 7 Mileage 7250 non-null float64 8 Engine 7206 non-null float64 9 Power 7077 non-null float64 10 Seats 7199 non-null object 11 New_price 1006 non-null float64 12 Price 6018 non-null float64 13 kilometers_driven_log 7252 non-null float64 14 price_log 6018 non-null float64 15 new_price_log 1006 non-null float64 dtypes: float64(8), int64(1), object(7) memory usage: 963.2+ KB
Observations and Insights for all the plots: _
Mileage is roughly left skewed, so we can transform. However, there are a lot of zeros which need to be cleared out first before we can do this.
--> Engine is multimodal. Need to check how to handle this.
-->Power is bimodal. Need to check how to handle this.
-->New_price is bimodal. Need to check how to handle this.
We are correct in transforming Price and Kilometers_Driven.
Year and Seats are a numerical variable. It is better to convert them to object variables.
# Let us write a function that will help us create barplots that indicate the percentage for each category.
# This function takes the categorical column as the input and returns the barplots for the variable.
def perc_on_bar(z):
'''
plot
feature: categorical feature
the function won't work if a column is passed in hue parameter
'''
total = len(df[z]) # Length of the column
plt.figure(figsize = (15, 5))
ax = sns.countplot(df[z], palette = 'Paired', order = df[z].value_counts().index)
for p in ax.patches:
percentage = '{:.1f}%'.format(100 * p.get_height() / total) # Percentage of each class of the category
x = p.get_x() + p.get_width() / 2 - 0.05 # Width of the plot
y = p.get_y() + p.get_height() # Hieght of the plot
ax.annotate(percentage, (x, y), size = 12) # Annotate the percantage
plt.show() # Show the plot
Let us plot barplot for the variable location. It will be helpful to know the number of percentage of cars from each city.
# Bar Plot for 'Location'
perc_on_bar('Location')
Note: Explore for other variables like Year, Fuel_Type, Transmission, Owner_Type`.
# Bar Plot for 'Name'
perc_on_bar('Name')
# Bar Plot for 'Year'
perc_on_bar('Year')
# Bar Plot for 'Fuel_Type'
perc_on_bar('Fuel_Type')
# Bar Plot for 'Transmission'
perc_on_bar('Transmission')
# Bar Plot for 'Owner_Type'
perc_on_bar('Owner_Type')
# Bar Plot for 'Seats'
perc_on_bar('Seats')
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 7252 entries, 0 to 7252 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 7252 non-null object 1 Location 7252 non-null object 2 Year 7252 non-null object 3 Kilometers_Driven 7252 non-null int64 4 Fuel_Type 7252 non-null object 5 Transmission 7252 non-null object 6 Owner_Type 7252 non-null object 7 Mileage 7250 non-null float64 8 Engine 7206 non-null float64 9 Power 7077 non-null float64 10 Seats 7199 non-null object 11 New_price 1006 non-null float64 12 Price 6018 non-null float64 13 kilometers_driven_log 7252 non-null float64 14 price_log 6018 non-null float64 15 new_price_log 1006 non-null float64 dtypes: float64(8), int64(1), object(7) memory usage: 963.2+ KB
Observations and Insights from all plots: _
Most cars in the dataset are sedans (5 seaters)
Most cars are sold by the original owner of the car
Most cars are Manual transmission
Diesel and petrol cars are roughly equal (with petrol cars being slighly lower)
There are older cars in dataset which is right skewed
Some brands are more popular than others, but there are a lot of brands for sale
Over 60% of the used cars sold are in Mumbai, Hyderabad, Coimbatore, Kochi and Pune.
For name, we need to eliminate the Name or transform it to categorial
A scatter plot allows us to see relationships between two variables.
Note: Use log transformed values 'kilometers_driven_log' and 'price_log'
# Let us plot pair plot for the variables 'year' and 'price_log'
df.plot(x = 'price_log', y = 'Year', style = 'o')
<AxesSubplot:xlabel='price_log'>
Note: Try to explore different combinations of independent variables and dependent variable. Understand the relationship between all variables.
sns.pairplot(df)
import matplotlib.pyplot as plt
plt.show()
Observations and Insights from all plots: _
Heat map shows a 2D correlation matrix between two numerical features.
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 7252 entries, 0 to 7252 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 7252 non-null object 1 Location 7252 non-null object 2 Year 7252 non-null object 3 Kilometers_Driven 7252 non-null int64 4 Fuel_Type 7252 non-null object 5 Transmission 7252 non-null object 6 Owner_Type 7252 non-null object 7 Mileage 7250 non-null float64 8 Engine 7206 non-null float64 9 Power 7077 non-null float64 10 Seats 7199 non-null object 11 New_price 1006 non-null float64 12 Price 6018 non-null float64 13 kilometers_driven_log 7252 non-null float64 14 price_log 6018 non-null float64 15 new_price_log 1006 non-null float64 dtypes: float64(8), int64(1), object(7) memory usage: 963.2+ KB
# We can include the log transformation values and drop the original skewed data columns
plt.figure(figsize = (12, 7))
sns.heatmap(df.drop(columns=['Kilometers_Driven','Price'],axis = 1).corr(), annot = True, vmin = -1, vmax = 1)
plt.show()
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 7252 entries, 0 to 7252 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 7252 non-null object 1 Location 7252 non-null object 2 Year 7252 non-null object 3 Kilometers_Driven 7252 non-null int64 4 Fuel_Type 7252 non-null object 5 Transmission 7252 non-null object 6 Owner_Type 7252 non-null object 7 Mileage 7250 non-null float64 8 Engine 7206 non-null float64 9 Power 7077 non-null float64 10 Seats 7199 non-null object 11 New_price 1006 non-null float64 12 Price 6018 non-null float64 13 kilometers_driven_log 7252 non-null float64 14 price_log 6018 non-null float64 15 new_price_log 1006 non-null float64 dtypes: float64(8), int64(1), object(7) memory usage: 963.2+ KB
Observations and Insights: _
Mileage is negatively correlated with a number of variables, especially Engine, new_price_log and Power Engine is strongly positively correlated with new_price_log, Power and Engine
# Let us write a function that will help us create boxplot w.r.t Price for any input categorical variable.
# This function takes the categorical column as the input and returns the boxplots for the variable.
def boxplot(z):
plt.figure(figsize = (12, 5)) # Setting size of boxplot
sns.boxplot(x = z, y = df['Price']) # Defining x and y
plt.show()
plt.figure(figsize = (12, 5))
plt.title('Without Outliers')
sns.boxplot(x = z, y = df['Price'], showfliers = False) # Turning off the outliers
plt.show()
# Box Plot: Price vs Location
boxplot(df['Location'])
Note: Explore by plotting box-plots for target variable and the other categorical variables like Fuel_Type, transmission, Owner_type.
boxplot(df['Fuel_Type'])
boxplot(df['Owner_Type'])
boxplot(df['Year'])
boxplot(df['Seats'])
boxplot(df['Transmission'])
Observations and Insights for all plots:__
Automatic cars are more expensive than manual.
2-seaters are the most expensive cars. This could be that they are sports cars, which are luxury items. This is followed by 4 seater cars. We need domain knowledge to understand why.
Newer cars are more pricey. Not surprising.
First owner cars are more expensive. These would tend to be newer cars. So this also makes sense.
Diesel cars are more expensive than petrol.
Think about it: The Name column in the current format might not be very useful in our analysis.
Since the name contains both the brand name and the model name of the vehicle, the column would have too many unique values to be useful in prediction. Can we extract that information from that column?
# Car name has both brand and model.
# We extract it here,as this will help to fill missing values of price column as brand
df['Brand'] = df['Name'].str.split(' ').str[0] #Separating Brand name from the Name
df['Model'] = df['Name'].str.split(' ').str[1] + df['Name'].str.split(' ').str[2]
#check
df.head().T
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| Name | Maruti Wagon R LXI CNG | Hyundai Creta 1.6 CRDi SX Option | Honda Jazz V | Maruti Ertiga VDI | Audi A4 New 2.0 TDI Multitronic |
| Location | Mumbai | Pune | Chennai | Chennai | Coimbatore |
| Year | 2010 | 2015 | 2011 | 2012 | 2013 |
| Kilometers_Driven | 72000 | 41000 | 46000 | 87000 | 40670 |
| Fuel_Type | CNG | Diesel | Petrol | Diesel | Diesel |
| Transmission | Manual | Manual | Manual | Manual | Automatic |
| Owner_Type | First | First | First | First | Second |
| Mileage | 26.6 | 19.67 | 18.2 | 20.77 | 15.2 |
| Engine | 998.0 | 1582.0 | 1199.0 | 1248.0 | 1968.0 |
| Power | 58.16 | 126.2 | 88.7 | 88.76 | 140.8 |
| Seats | 5.0 | 5.0 | 5.0 | 7.0 | 5.0 |
| New_price | NaN | NaN | 8.61 | NaN | NaN |
| Price | 1.75 | 12.5 | 4.5 | 6.0 | 17.74 |
| kilometers_driven_log | 11.184421 | 10.621327 | 10.736397 | 11.373663 | 10.613246 |
| price_log | 0.559616 | 2.525729 | 1.504077 | 1.791759 | 2.875822 |
| new_price_log | NaN | NaN | 2.152924 | NaN | NaN |
| Brand | Maruti | Hyundai | Honda | Maruti | Audi |
| Model | WagonR | Creta1.6 | JazzV | ErtigaVDI | A4New |
# Now lets check for unique names
df.Brand.unique()
array(['Maruti', 'Hyundai', 'Honda', 'Audi', 'Nissan', 'Toyota',
'Volkswagen', 'Tata', 'Land', 'Mitsubishi', 'Renault',
'Mercedes-Benz', 'BMW', 'Mahindra', 'Ford', 'Porsche', 'Datsun',
'Jaguar', 'Volvo', 'Chevrolet', 'Skoda', 'Mini', 'Fiat', 'Jeep',
'Smart', 'Ambassador', 'Isuzu', 'ISUZU', 'Force', 'Bentley',
'Lamborghini', 'Hindustan', 'OpelCorsa'], dtype=object)
#There seems to be an issue with some unique names.
#Isuzu and ISUZU are the same thing
#Land should be Land Rover, which has a number of models
#Mini also has a number of models
col=['ISUZU','Isuzu','Mini','Land']
#Lets take a snippet and check out our suspicions
df[df.Brand.isin(col)].sample(5)
| Name | Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_price | Price | kilometers_driven_log | price_log | new_price_log | Brand | Model | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2604 | Mini Cooper Convertible S | Mumbai | 2016 | 15000 | Petrol | Automatic | First | 16.82 | 1998.0 | 189.08 | 4.0 | 44.28 | 35.00 | 9.615805 | 3.555348 | 3.790533 | Mini | CooperConvertible |
| 5545 | Land Rover Range Rover Sport SE | Delhi | 2014 | 47000 | Diesel | Automatic | Second | 12.65 | 2993.0 | 255.00 | 5.0 | 139.00 | 64.75 | 10.757903 | 4.170534 | 4.934474 | Land | RoverRange |
| 1460 | Land Rover Range Rover Sport 2005 2012 Sport | Coimbatore | 2008 | 69078 | Petrol | Manual | First | 0.00 | NaN | NaN | NaN | NaN | 40.88 | 11.142992 | 3.710641 | NaN | Land | RoverRange |
| 2073 | Mini Cooper 5 DOOR D | Hyderabad | 2017 | 2000 | Diesel | Automatic | First | 20.70 | 1496.0 | 113.98 | 5.0 | 42.48 | 34.00 | 7.600902 | 3.526361 | 3.749033 | Mini | Cooper5 |
| 718 | Mini Cooper S | Pune | 2012 | 37000 | Petrol | Automatic | Second | 13.60 | 1598.0 | 181.00 | 4.0 | NaN | 17.00 | 10.518673 | 2.833213 | NaN | Mini | CooperS |
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 7252 entries, 0 to 7252 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 7252 non-null object 1 Location 7252 non-null object 2 Year 7252 non-null object 3 Kilometers_Driven 7252 non-null int64 4 Fuel_Type 7252 non-null object 5 Transmission 7252 non-null object 6 Owner_Type 7252 non-null object 7 Mileage 7250 non-null float64 8 Engine 7206 non-null float64 9 Power 7077 non-null float64 10 Seats 7199 non-null object 11 New_price 1006 non-null float64 12 Price 6018 non-null float64 13 kilometers_driven_log 7252 non-null float64 14 price_log 6018 non-null float64 15 new_price_log 1006 non-null float64 16 Brand 7252 non-null object 17 Model 7251 non-null object dtypes: float64(8), int64(1), object(9) memory usage: 1.1+ MB
# Let's change all brandnames so that they are common in the database
df.loc[df.Brand == 'ISUZU','Brand']='Isuzu'
df.loc[df.Brand=='Mini','Brand']='Mini Cooper'
df.loc[df.Brand=='Land','Brand']='Land Rover'
df.Brand.nunique()
32
df.groupby(df.Brand).size().sort_values(ascending =False)
Brand Maruti 1444 Hyundai 1340 Honda 743 Toyota 507 Mercedes-Benz 380 Volkswagen 374 Ford 351 Mahindra 331 BMW 311 Audi 285 Tata 228 Skoda 202 Renault 170 Chevrolet 151 Nissan 117 Land Rover 67 Jaguar 48 Fiat 38 Mitsubishi 36 Mini Cooper 31 Volvo 28 Jeep 19 Porsche 19 Datsun 17 Isuzu 5 Force 3 Bentley 2 Lamborghini 1 OpelCorsa 1 Hindustan 1 Smart 1 Ambassador 1 dtype: int64
df.Model.isnull().sum()
1
# We notice there is one model missing from the dataset. Let's drop that row.
df.dropna(subset=['Model'],axis=0,inplace=True)
df.Model.nunique()
726
#Let's examine the most popular model
df.groupby('Model')['Model'].size().nlargest(30)
Model SwiftDzire 189 Grandi10 179 WagonR 178 Innova2.5 145 Verna1.6 127 City1.5 122 Cityi 115 Creta1.6 110 NewC-Class 110 3Series 109 SwiftVDI 96 5Series 86 i201.2 78 SantroXing 76 XUV500W8 75 i10Sportz 75 AmazeS 69 i10Magna 69 Alto800 63 CorollaAltis 63 FigoDiesel 61 Ecosport1.5 59 A42.0 56 AltoK10 56 VitaraBrezza 55 i20Asta 54 InnovaCrysta 53 i20Sportz 53 Duster110PS 51 Fortuner4x2 50 Name: Model, dtype: int64
There are 32 unique brands in this dataset. Maruti and Hyundai dominate in terms of sales. There are 726 unique models in the dataset. The most popular brand of car is SwiftDzire, followed by Grandi10 and WagonR.
# Now check the missing values of each column. Hint: Use isnull() method
df.isnull().sum()
Name 0 Location 0 Year 0 Kilometers_Driven 0 Fuel_Type 0 Transmission 0 Owner_Type 0 Mileage 2 Engine 46 Power 175 Seats 53 New_price 6245 Price 1233 kilometers_driven_log 0 price_log 1233 new_price_log 6245 Brand 0 Model 0 dtype: int64
#Let's see this graphically
msno.bar(df)
<AxesSubplot:>
Missing values in Seats
# Checking missing values in the column 'Seats'
df['Seats'].isnull().sum()
53
Think about it: Can we somehow use the extracted information from 'Name' column to impute missing values?
Hint: Impute these missing values one by one, by taking median number of seats for the particular car, using the Brand and Model name.
#Group by Name to determine values
df['Seats']=df.groupby(['Name'])['Seats'].apply(lambda x:x.fillna(x.median()))
df['Seats'].isnull().sum()
46
#Now let's try grouping by Model
df['Seats']=df.groupby(['Model'])['Seats'].apply(lambda x:x.fillna(x.median()))
df['Seats'].isnull().sum()
22
#Let's check now which car values are missing
df[df['Seats'].isnull()==True].head(10)
| Name | Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_price | Price | kilometers_driven_log | price_log | new_price_log | Brand | Model | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 208 | Maruti Swift 1.3 VXi | Kolkata | 2010 | 42001 | Petrol | Manual | First | 16.1 | NaN | NaN | NaN | NaN | 2.11 | 10.645449 | 0.746688 | NaN | Maruti | Swift1.3 |
| 733 | Maruti Swift 1.3 VXi | Chennai | 2006 | 97800 | Petrol | Manual | Third | 16.1 | NaN | NaN | NaN | NaN | 1.75 | 11.490680 | 0.559616 | NaN | Maruti | Swift1.3 |
| 1327 | Maruti Swift 1.3 ZXI | Hyderabad | 2015 | 50295 | Petrol | Manual | First | 16.1 | NaN | NaN | NaN | NaN | 5.80 | 10.825661 | 1.757858 | NaN | Maruti | Swift1.3 |
| 2074 | Maruti Swift 1.3 LXI | Pune | 2011 | 24255 | Petrol | Manual | First | 16.1 | NaN | NaN | NaN | NaN | 3.15 | 10.096378 | 1.147402 | NaN | Maruti | Swift1.3 |
| 2325 | Maruti Swift 1.3 VXI ABS | Pune | 2015 | 67000 | Petrol | Manual | First | 16.1 | NaN | NaN | NaN | NaN | 4.70 | 11.112448 | 1.547563 | NaN | Maruti | Swift1.3 |
| 2335 | Maruti Swift 1.3 VXi | Mumbai | 2007 | 55000 | Petrol | Manual | Second | 16.1 | NaN | NaN | NaN | NaN | 1.75 | 10.915088 | 0.559616 | NaN | Maruti | Swift1.3 |
| 2369 | Maruti Estilo LXI | Chennai | 2008 | 56000 | Petrol | Manual | Second | 19.5 | 1061.0 | NaN | NaN | NaN | 1.50 | 10.933107 | 0.405465 | NaN | Maruti | EstiloLXI |
| 2668 | Maruti Swift 1.3 VXi | Kolkata | 2014 | 32986 | Petrol | Manual | First | 16.1 | NaN | NaN | NaN | NaN | 4.24 | 10.403839 | 1.444563 | NaN | Maruti | Swift1.3 |
| 3404 | Maruti Swift 1.3 VXi | Jaipur | 2006 | 125000 | Petrol | Manual | Fourth & Above | 16.1 | NaN | NaN | NaN | NaN | 2.35 | 11.736069 | 0.854415 | NaN | Maruti | Swift1.3 |
| 3810 | Honda CR-V AT With Sun Roof | Kolkata | 2013 | 27000 | Petrol | Automatic | First | 14.0 | NaN | NaN | NaN | NaN | 11.99 | 10.203592 | 2.484073 | NaN | Honda | CR-VAT |
# Now check total number of missing values of the seat column to verify if they are imputed or not. Hint: Use isnull() method
df['Seats'].isnull().sum()
22
#Doing a Google search: https://www.cardekho.com/maruti/swift/specs, we see that the Maruti Swift 1.3 is a 5 seater.
#Doing a Google search: https://www.cardekho.com/overview/Maruti_Zen_Estilo/Maruti_Zen_Estilo_LXI_BS_IV.htm, is also a 5 seater
#Impute them all with 5 seats
df['Seats']=df['Seats'].fillna(5)
#Check if imputed
df['Seats'].isnull().sum()
0
Missing values for Mileage
#Let's look at how many missing values there are for Engine, Power and Mileage
col=['Engine','Power','Mileage']
df[col].isnull().sum()
Engine 46 Power 175 Mileage 2 dtype: int64
#Let's start filling missing values by grouping Name and Year and fill in missing values
#with median.
df.groupby(['Name','Year'])['Engine'].median().head(30)
df['Engine']=df.groupby(['Name','Year'])['Engine'].apply(lambda x:x.fillna(x.median()))
df['Power']=df.groupby(['Name','Year'])['Power'].apply(lambda x:x.fillna(x.median()))
df['Mileage']=df.groupby(['Name','Year'])['Mileage'].apply(lambda x:x.fillna(x.median()))
col=['Engine','Power','Mileage']
df[col].isnull().sum()
Engine 45 Power 162 Mileage 2 dtype: int64
#Let's look at each unique combination of Brand and Model and display the top 10 results.
df.groupby(['Brand','Model'])['Engine'].median().head(10)
Brand Model
Ambassador ClassicNova 1489.0
Audi A335 1968.0
A41.8 1781.0
A42.0 1968.0
A43.0 2967.0
A43.2 3197.0
A430 1395.0
A435 1968.0
A4New 1968.0
A62.0 1968.0
Name: Engine, dtype: float64
# Now check missing values of each column. Hint: Use isnull() method
df.isnull().sum()
Name 0 Location 0 Year 0 Kilometers_Driven 0 Fuel_Type 0 Transmission 0 Owner_Type 0 Mileage 2 Engine 45 Power 162 Seats 0 New_price 6245 Price 1233 kilometers_driven_log 0 price_log 1233 new_price_log 6245 Brand 0 Model 0 dtype: int64
# Impute missing Mileage. For example, use can use median or any other methods.
# This was treated above.
df['Mileage'].isnull().sum()
2
#Since this is 2 records, we can drop them
df.dropna(subset=['Mileage'],axis=0,inplace=True)
df.isnull().sum()
Name 0 Location 0 Year 0 Kilometers_Driven 0 Fuel_Type 0 Transmission 0 Owner_Type 0 Mileage 0 Engine 45 Power 162 Seats 0 New_price 6244 Price 1233 kilometers_driven_log 0 price_log 1233 new_price_log 6244 Brand 0 Model 0 dtype: int64
Missing values for Engine
df['Engine'].isnull().sum()
45
#Let's look at median, mean and max for Engine to see if we can impute with one of these values.
df.groupby(['Model','Year'])['Engine'].agg({'median','mean','max'}).sort_values(by='Model',ascending=True).head(10)
| max | mean | median | ||
|---|---|---|---|---|
| Model | Year | |||
| 1000AC | 1998 | 970.0 | 970.000000 | 970.0 |
| 1Series | 2013 | 1995.0 | 1995.000000 | 1995.0 |
| 2015 | 1995.0 | 1995.000000 | 1995.0 | |
| 370ZAT | 2012 | 3696.0 | 3696.000000 | 3696.0 |
| 3Series | 2018 | 1995.0 | 1995.000000 | 1995.0 |
| 2017 | 1995.0 | 1995.000000 | 1995.0 | |
| 2016 | 1995.0 | 1995.000000 | 1995.0 | |
| 2015 | 1995.0 | 1995.000000 | 1995.0 | |
| 2014 | 2993.0 | 2078.166667 | 1995.0 | |
| 2013 | 2993.0 | 2066.428571 | 1995.0 |
df.isnull().sum()
Name 0 Location 0 Year 0 Kilometers_Driven 0 Fuel_Type 0 Transmission 0 Owner_Type 0 Mileage 0 Engine 45 Power 162 Seats 0 New_price 6244 Price 1233 kilometers_driven_log 0 price_log 1233 new_price_log 6244 Brand 0 Model 0 dtype: int64
#Let's impute these records with median=1995
cols1 = ["Engine"]
for ii in cols1:
df[ii] = df[ii].fillna(df[ii].median())
#Check if imputed
df['Engine'].isnull().sum()
0
Missing values for Power
df['Power'].isnull().sum()
162
df.groupby(['Model','Year'])['Power'].agg({'median','mean','max'}).sort_values(by='Model',ascending=True).head(10)
| max | mean | median | ||
|---|---|---|---|---|
| Model | Year | |||
| 1000AC | 1998 | NaN | NaN | NaN |
| 1Series | 2013 | 143.0 | 143.000000 | 143.0 |
| 2015 | 143.0 | 143.000000 | 143.0 | |
| 370ZAT | 2012 | 328.5 | 328.500000 | 328.5 |
| 3Series | 2018 | 190.0 | 188.000000 | 190.0 |
| 2017 | 190.0 | 188.820000 | 190.0 | |
| 2016 | 190.0 | 189.333333 | 190.0 | |
| 2015 | 190.0 | 185.981429 | 184.0 | |
| 2014 | 245.0 | 189.666667 | 184.0 | |
| 2013 | 245.0 | 191.785714 | 184.0 |
cols1 = ['Power']
for ii in cols1:
df[ii] = df[ii].fillna(df[ii].median())
df['Power'].isnull().sum()
0
Missing values for New_price
df['New_price'].isnull().sum()
6244
df.isnull().sum()
Name 0 Location 0 Year 0 Kilometers_Driven 0 Fuel_Type 0 Transmission 0 Owner_Type 0 Mileage 0 Engine 0 Power 0 Seats 0 New_price 6244 Price 1233 kilometers_driven_log 0 price_log 1233 new_price_log 6244 Brand 0 Model 0 dtype: int64
#We will drop the records with New_price.
#We will also drop records of the target variable Price so as not to introduce bias.
df.drop(columns=["new_price_log"], inplace = True, axis = 1)
df.drop(columns=["New_price"], inplace = True, axis = 1)
df.shape
(7249, 16)
df.isnull().sum()
Name 0 Location 0 Year 0 Kilometers_Driven 0 Fuel_Type 0 Transmission 0 Owner_Type 0 Mileage 0 Engine 0 Power 0 Seats 0 Price 1233 kilometers_driven_log 0 price_log 1233 Brand 0 Model 0 dtype: int64
df.shape
(7249, 16)
df=df.dropna()
df.shape
(6016, 16)
Observations for missing values after imputing: _
We successfully imputed Power, Engine, Mileage and Seats. However, there are a significant number of missing values for New_price. The column was dropped.
POTENTIAL TECHNIQUES
There is no strict sequence that we must absolutely follow when applying machine learning techniques to a problem. However, there are some general guidelines that can be used to guide the process:
We need to ensure that the data is clean. We did a lot of the work in this Milestonewhere we did EDA, univariate analysis on numerical and categorial data, bivariate data using scatterplots, box plots and heat map. We also did feature engineering where we imputed missing values.
First we need to understand the problem that we are trying to solve and the type data we are working with, as well as the type of outcome we want to predict. In our situation, we are predicting a price, therefore the final method should provide a single value.
We will start with a simple technique - linear regression or decision trees, which is easy to understand and interpret. This will provide a baseline for performance and a starting point for further experimentation.
Next, we can try ensemble methods like Random Forest which can possibly improve performance and reduce overfitting compared to individual decision trees.
In case we are overfitting, we can use regularization techniques like Ridge or Lasso regression. This can also be used to improve the performance and generalization of the model.
We could try more complex methods if the performance of the simpler techniques is not satisfactory such as neural networks. But this is unlikely since this is the Machine Learning capstone.
Finally, we need to evaluate the performance of the different techniques using appropriate metrics, such as accuracy or mean squared error. Select the technique that performs best on our specific problem.
OVERALL SOLUTION DESIGN
We will use the following process for the design of our solution:
EDA: examine summary statistics on numerical and categorical values to understand the data. This is actually an important step in which a deep understanding of the definiton of each variable is considered.
Perform univariate analysis on numerical and categorial data to understand the shape of each variable and to ensure that the distributions are normal. If not normal, then perform log transformation.
Perform bivariate data using scatterplots, bos plots and heat map to discover the relationships between the variables.
Perform feature engineering where missing values are discovered and imputed.
Clean the data and prepare it for linear regression.
Separate the dependent and independt variables, and then split the data into training and test.
Looking ahead to Milestone 2, we will build our supervised learning models: 1) Linear Regression, 2) Ridge/Lasso Regression, 3) Decision Trees, 4) Random Forest
We will refine our insights to uncover the most meaningful insights relevent to the problem.
We will provide a comparison of the various techniques and their relative performance. That is, we will determine how they performed, which one is better relative to others, is there scope for improvement.
Finally, we will make a proposal as to which model should be adoped. We will stipulate why this is the best solution For Cars4U to adopt.
MEASURES OF SUCCESS
We can provide some general guidelines for interpreting some of the most commonly used metrics for success, but it will depend on what we find in the various models
Mean Absolute Error (MAE): A smaller value for MAE indicates a better fit of the model to the data. A value of 0 would indicate a perfect fit.
Mean Squared Error (MSE): Similar to MAE, a smaller value for MSE indicates a better fit of the model to the data. A value of 0 would indicate a perfect fit.
Root Mean Squared Error (RMSE): Similar to MSE, a smaller value for RMSE indicates a better fit of the model to the data. A value of 0 would indicate a perfect fit.
R-squared: This metric ranges from 0 to 1, with 1 indicating a perfect fit of the model to the data. Values close to 1 indicate a good fit, while values close to 0 indicate a poor fit.
Accuracy: This metric ranges from 0 to 1, with 1 indicating that all instances were correctly classified. A value of 0.8 or higher is often considered good, but again it depends on the problem and the cost of false positives and false negatives.
Precision: This metric ranges from 0 to 1, with 1 indicating that all positive predictions were correct. A value of 0.8 or higher is often considered good.
Recall: This metric ranges from 0 to 1, with 1 indicating that all actual positive instances were correctly identified. A value of 0.8 or higher is often considered good.
F1-score: This metric ranges from 0 to 1, with 1 indicating a perfect balance of precision and recall. A value of 0.8 or higher is often considered good.
Please save the pre-processed dataset into a separate file so that we can continue without having to repeat the work we did in Milestone1. The stored data frame can be loaded into Milestone2 and implemented further.
To save the pre-processed data frame, please follow the below lines of code:
# Assume df_cleaned is the pre-processed data frame in your code, then
df_cleaned=df
df_cleaned.to_csv("cars_data_updated.csv", index = False)
The above code helps to save the cleaned/pre-processed dataset into csv file, that can be further loaded into Milestone2.
Note: Please load the data frame that was saved in Milestone 1 here before separating the data, and then proceed to the next step in Milestone 2.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn import metrics
from sklearn.metrics import mean_squared_error
from sklearn.metrics import accuracy_score
#Import library for plotting data.
import matplotlib.pyplot as plt
#to ignore warnings
import warnings
warnings.filterwarnings('ignore')
cars_data = pd.read_csv("cars_data_updated.csv")
cars_data.shape
(7249, 16)
cars_data.head()
| Name | Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | Price | kilometers_driven_log | price_log | Brand | Model | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Maruti Wagon R LXI CNG | Mumbai | 2010 | 72000 | CNG | Manual | First | 26.60 | 998.0 | 58.16 | 5.0 | 1.75 | 11.184421 | 0.559616 | Maruti | WagonR |
| 1 | Hyundai Creta 1.6 CRDi SX Option | Pune | 2015 | 41000 | Diesel | Manual | First | 19.67 | 1582.0 | 126.20 | 5.0 | 12.50 | 10.621327 | 2.525729 | Hyundai | Creta1.6 |
| 2 | Honda Jazz V | Chennai | 2011 | 46000 | Petrol | Manual | First | 18.20 | 1199.0 | 88.70 | 5.0 | 4.50 | 10.736397 | 1.504077 | Honda | JazzV |
| 3 | Maruti Ertiga VDI | Chennai | 2012 | 87000 | Diesel | Manual | First | 20.77 | 1248.0 | 88.76 | 7.0 | 6.00 | 11.373663 | 1.791759 | Maruti | ErtigaVDI |
| 4 | Audi A4 New 2.0 TDI Multitronic | Coimbatore | 2013 | 40670 | Diesel | Automatic | Second | 15.20 | 1968.0 | 140.80 | 5.0 | 17.74 | 10.613246 | 2.875822 | Audi | A4New |
# This drops all records that have null values in price_log
cars_data.dropna(subset=["price_log"], inplace=True)
#check data types
cars_data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 6016 entries, 0 to 6015 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 6016 non-null object 1 Location 6016 non-null object 2 Year 6016 non-null int64 3 Kilometers_Driven 6016 non-null int64 4 Fuel_Type 6016 non-null object 5 Transmission 6016 non-null object 6 Owner_Type 6016 non-null object 7 Mileage 6016 non-null float64 8 Engine 6016 non-null float64 9 Power 6016 non-null float64 10 Seats 6016 non-null float64 11 Price 6016 non-null float64 12 kilometers_driven_log 6016 non-null float64 13 price_log 6016 non-null float64 14 Brand 6016 non-null object 15 Model 6016 non-null object dtypes: float64(7), int64(2), object(7) memory usage: 799.0+ KB
cars_data.shape
(6016, 16)
#count of unique features
cars_data.nunique()
Name 1874 Location 11 Year 22 Kilometers_Driven 3092 Fuel_Type 4 Transmission 2 Owner_Type 4 Mileage 430 Engine 145 Power 369 Seats 8 Price 1373 kilometers_driven_log 3092 price_log 1373 Brand 30 Model 687 dtype: int64
## Check total number of missing values of each column. Hint: Use isnull() method
cars_data.isnull().sum()
Name 0 Location 0 Year 0 Kilometers_Driven 0 Fuel_Type 0 Transmission 0 Owner_Type 0 Mileage 0 Engine 0 Power 0 Seats 0 Price 0 kilometers_driven_log 0 price_log 0 Brand 0 Model 0 dtype: int64
# Visualization of correlation
# import pandas as pd - already imported at the beginning
import seaborn as sns
# calculate the correlation matrix
corr = cars_data.corr()
# create a heatmap of the correlation matrix
sns.heatmap(corr, annot=True)
<AxesSubplot:>
Think about it: Why we should drop 'Name','Price','price_log','Kilometers_Driven' from X before splitting?
SPLITTING THE DATA In linear regression, we are trying to build a model to predict the value of a dependent variable based on the values of one or more independent variables. When splitting the data into training and testing sets, it's important to drop the dependent variable from the dataset we use for training the model. This is because the model should not have access to the dependent variable during the training process, as it would not be reflective of real-world scenarios when we use the model to make predictions on new, unseen data. If the dependent variable is included in the training data, the model may simply memorize the data rather than learn the underlying relationships between the independent and dependent variables. By only using the independent variables to train the model, we can ensure that it has learned the underlying relationships and is able to make accurate predictions on new data.
Name should be dropped as it has too many unique values and we will use Brand and/or Model instead. Price and price_log should be dropped when building the model Kilometers_Driven is replaced by its log
TESTING THE DATA Note that the dependent variable data is typically only used during the evaluation of the model. This means that after the model has been trained on the training data, which does not include the target variable, the model is then tested on a separate set of data called the testing data. The testing data should include the target variable, so that we can compare the predictions made by the model to the actual values of the target variable in the testing data. This allows us to evaluate the performance of the model and determine how well it is able to make predictions on new, unseen data. The evaluation metric used could be mean squared error, R squared, adjusted R squared etc. In summary, the dependent variable data is not used during the training process, but is used during the evaluation process to measure the model's performance.
# Step-1
X = cars_data.drop(['Name','Price','price_log','Kilometers_Driven'], axis = 1)
y = cars_data[["price_log", "Price"]]
X.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 6016 entries, 0 to 6015 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Location 6016 non-null object 1 Year 6016 non-null int64 2 Fuel_Type 6016 non-null object 3 Transmission 6016 non-null object 4 Owner_Type 6016 non-null object 5 Mileage 6016 non-null float64 6 Engine 6016 non-null float64 7 Power 6016 non-null float64 8 Seats 6016 non-null float64 9 kilometers_driven_log 6016 non-null float64 10 Brand 6016 non-null object 11 Model 6016 non-null object dtypes: float64(5), int64(1), object(6) memory usage: 611.0+ KB
y.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 6016 entries, 0 to 6015 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 price_log 6016 non-null float64 1 Price 6016 non-null float64 dtypes: float64(2) memory usage: 141.0 KB
# Check total number of missing values of each column in X. Hint: Use isnull() method
X.isnull().sum()
Location 0 Year 0 Fuel_Type 0 Transmission 0 Owner_Type 0 Mileage 0 Engine 0 Power 0 Seats 0 kilometers_driven_log 0 Brand 0 Model 0 dtype: int64
# Check total number of missing values of each column in y. Hint: Use isnull() method
y.isnull().sum()
price_log 0 Price 0 dtype: int64
# Step-2 Use pd.get_dummies(drop_first = True)
X = pd.get_dummies(X, drop_first = True)
X.head()
| Year | Mileage | Engine | Power | Seats | kilometers_driven_log | Location_Bangalore | Location_Chennai | Location_Coimbatore | Location_Delhi | ... | Model_i201.4 | Model_i202015-2017 | Model_i20Active | Model_i20Asta | Model_i20Diesel | Model_i20Era | Model_i20Magna | Model_i20Sportz | Model_redi-GOS | Model_redi-GOT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2010 | 26.60 | 998.0 | 58.16 | 5.0 | 11.184421 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 2015 | 19.67 | 1582.0 | 126.20 | 5.0 | 10.621327 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 2011 | 18.20 | 1199.0 | 88.70 | 5.0 | 10.736397 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 2012 | 20.77 | 1248.0 | 88.76 | 7.0 | 11.373663 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 2013 | 15.20 | 1968.0 | 140.80 | 5.0 | 10.613246 | 0 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 738 columns
#Check shape of X
X.shape
(6016, 738)
#Check shape of y
y.shape
(6016, 2)
# Import library for preparing data
from sklearn.model_selection import train_test_split
# Step-3 Splitting data into training and test set:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
(4211, 738) (1805, 738) (4211, 2) (1805, 2)
Next, we define the function that we will use to evaluate each model created in this notebook. We use this model to determine the best solution to our business problem.
1) The get_model_score function uses R2 2) The get_model_score_adjusted_R2 uses Adjusted R2
This code perform evaluation of a given regression model. The function takes the regression model as an input and returns a list of four scores: the training set R-square, the test set R-square, the training set RMSE, and the test set RMSE.
The code first makes predictions on the training and test sets using the input model, and then uses the R-square and RMSE metrics from the scikit-learn library to evaluate the performance of the model. The scores are then stored in the 'score_list' list and returned at the end of the function.
Additionally, if the flag input is set to True (which is the default), the function will also print the R-square and RMSE scores for both the training and test sets.
#Let us write a function for calculating r2_score and RMSE on train and test data
#This function takes model as an input on which we have trained particular algorithm
#The categorical column as the input and returns the boxplots and histograms for the variable
def get_model_score(model, flag = True):
'''
model : regressor to predict values of X
'''
# Defining an empty list to store train and test results
score_list = []
pred_train = model.predict(X_train)
pred_train_ = np.exp(pred_train)
pred_test = model.predict(X_test)
pred_test_ = np.exp(pred_test)
train_r2 = metrics.r2_score(y_train['Price'], pred_train_)
test_r2 = metrics.r2_score(y_test['Price'], pred_test_)
train_rmse = metrics.mean_squared_error(y_train['Price'], pred_train_, squared = False)
test_rmse = metrics.mean_squared_error(y_test['Price'], pred_test_, squared = False)
# Adding all scores in the list
score_list.extend((train_r2, test_r2, train_rmse, test_rmse))
# If the flag is set to True then only the following print statements will be dispayed, the default value is True
if flag == True:
print("R-square on training set : ", metrics.r2_score(y_train['Price'], pred_train_))
print("R-square on test set : ", metrics.r2_score(y_test['Price'], pred_test_))
print("RMSE on training set : ", np.sqrt(metrics.mean_squared_error(y_train['Price'], pred_train_)))
print("RMSE on test set : ", np.sqrt(metrics.mean_squared_error(y_test['Price'], pred_test_)))
# Returning the list with train and test scores
return score_list
# This function uses Adjusted R2
def get_model_score_adjusted_R2(model, flag = True):
'''
model : regressor to predict values of X
'''
pred_train = model.predict(X_train)
pred_train_ = np.exp(pred_train)
pred_test = model.predict(X_test)
pred_test_ = np.exp(pred_test)
n = X_train.shape[0]
p = X_train.shape[1]
train_r2 = 1 - (1-metrics.r2_score(y_train['Price'], pred_train_))*(n-1)/(n-p-1)
n = X_test.shape[0]
p = X_test.shape[1]
test_r2 = 1 - (1-metrics.r2_score(y_test['Price'], pred_test_))*(n-1)/(n-p-1)
train_rmse = np.sqrt(metrics.mean_squared_error(y_train['Price'], pred_train_))
test_rmse = np.sqrt(metrics.mean_squared_error(y_test['Price'], pred_test_))
score_list_adjusted_R2 = [train_r2, test_r2, train_rmse, test_rmse]
if flag:
print("Adjusted R2 on training set : ", train_r2)
print("Adjusted R2 on test set : ", test_r2)
print("RMSE on training set : ", train_rmse)
print("RMSE on test set : ", test_rmse)
return score_list_adjusted_R2
For Regression Problems, some of the algorithms used are :
1) Linear Regression
2) Ridge / Lasso Regression
3) Decision Trees
4) Random Forest
Linear Regression can be implemented using:
1) Sklearn: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
2) Statsmodels: https://www.statsmodels.org/stable/regression.html
# Import Linear Regression from sklearn
from sklearn.linear_model import LinearRegression
# Create a linear regression model
lr = LinearRegression()
# Fit linear regression model
lr.fit(X_train, y_train['price_log'])
LinearRegression()
# Get score of the model
LR_score = get_model_score(lr)
R-square on training set : 0.9632326167143936 R-square on test set : 0.8884457821245254 RMSE on training set : 2.1857914088366726 RMSE on test set : 3.541569655485
print(LR_score)
[0.9632326167143936, 0.8884457821245254, 2.1857914088366726, 3.541569655485]
X.shape
(6016, 738)
y.shape
(6016, 2)
Checking for the assumptions and rebuilding the model
y_pred = lr.predict(X_test)
residuals = y_test['price_log'] - y_pred
print(residuals)
5460 0.334171
4367 -0.018583
1227 0.166880
2253 -0.359454
79 -0.129142
...
188 -0.038821
5218 0.135975
3884 0.113695
3978 0.024030
2698 0.031614
Name: price_log, Length: 1805, dtype: float64
residuals.mean()
-0.010740112265732153
sns.histplot(residuals, kde = True)
<AxesSubplot:xlabel='price_log', ylabel='Count'>
# Predicted values
fitted = lr.predict(X_train)
residuals = y_train['price_log'] - fitted
#fitted = ols_res_1.fittedvalues
sns.residplot(x = fitted, y = residuals, color = "lightblue")
plt.xlabel("Fitted Values")
plt.ylabel("Residual")
plt.title("Residual PLOT")
plt.show()
import seaborn as sns
y_pred = lr.predict(X_test)
residuals = y_test['price_log'] - y_pred
p = sns.scatterplot(y_pred,residuals)
plt.xlabel('y_pred/predicted values')
plt.ylabel('Residuals')
plt.ylim(-5,5)
plt.xlim(0,26)
p = sns.lineplot([0,26],[0,0],color='blue')
p = plt.title('Residuals vs fitted values plot for homoscedasticity check')
# get the coefficients of the linear regression model
coefs = lr.coef_
# create a dataframe to store the coefficients
coef_df = pd.DataFrame({'Feature': X_train.columns, 'Coefficient': coefs})
# sort the dataframe by the magnitude of the coefficients
coef_df['Absolute_Coefficient'] = coef_df['Coefficient'].abs()
coef_df.sort_values(by='Absolute_Coefficient', ascending=False, inplace=True)
# print the feature coefficients
print("Feature Coefficients: \n", coef_df)
Feature Coefficients:
Feature Coefficient Absolute_Coefficient
162 Model_CayenneBase -3.224471e+00 3.224471e+00
422 Model_MustangV8 1.383597e+00 1.383597e+00
427 Model_NanoSTD -1.252193e+00 1.252193e+00
352 Model_Ikon1.4 -1.231596e+00 1.231596e+00
36 Brand_Lamborghini 1.159825e+00 1.159825e+00
.. ... ... ...
177 Model_CiazAlpha 3.733125e-15 3.733125e-15
700 Model_XUV500W9 5.551115e-17 5.551115e-17
683 Model_XE2.0L 1.259030e-17 1.259030e-17
715 Model_ZenLXI 0.000000e+00 0.000000e+00
684 Model_XEPortfolio 0.000000e+00 0.000000e+00
[738 rows x 3 columns]
original_columns = [col for col in X_train.columns if '_' not in col]
coef_df = coef_df[coef_df['feature'].isin(original_columns)]
top_5 = coef_df.sort_values(by='coef', ascending=False).head(7)
plt.barh(top_5['feature'], top_5['coef'])
plt.xlabel('Coefficient')
plt.ylabel('Feature')
plt.title('Top 5 Original Features and Their Coefficients')
plt.show()
Observations from results: _
R-square on training set : 0.9632326167144044 and R-square on test set : 0.8884457821246294 are indicating that the model is performing well on both training and test set. A high R-squared value (closer to 1) indicates that the model is explaining a large proportion of the variance in the data. However, in this case, the R-square on the training set is higher than the R-square on the test set, which indicates that the model is overfitting the training data.
RMSE on training set : 2.185791408836351 and RMSE on test set : 3.5415696554833485 are indicating the error of the model on both training and test set. RMSE (Root Mean Squared Error) is a measure of the difference between the predicted and actual values. The lower the RMSE, the better the model. In this case, the RMSE on the training set is lower than the RMSE on the test set, which indicates that the model is not generalizing well to unseen data.
Overall, it seems like the model is overfitting the training data and is not generalizing well to unseen data.
Important variables of Linear Regression
Building a model using statsmodels.
# Import Statsmodels
import statsmodels.api as sm
# Statsmodel api does not add a constant by default. We need to add it explicitly
x_train = sm.add_constant(X_train)
# Add constant to test data
x_test = sm.add_constant(X_test)
def build_ols_model(train):
# Create the model
olsmodel = sm.OLS(y_train["price_log"], train)
return olsmodel.fit()
# Fit linear model on new dataset
olsmodel1 = build_ols_model(X_train)
print(olsmodel1.summary())
OLS Regression Results
==============================================================================
Dep. Variable: price_log R-squared: 0.973
Model: OLS Adj. R-squared: 0.969
Method: Least Squares F-statistic: 204.2
Date: Thu, 02 Feb 2023 Prob (F-statistic): 0.00
Time: 12:22:32 Log-Likelihood: 2207.0
No. Observations: 4211 AIC: -3134.
Df Residuals: 3571 BIC: 927.1
Df Model: 639
Covariance Type: nonrobust
=============================================================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------------------
Year 0.0997 0.002 65.021 0.000 0.097 0.103
Mileage -0.0039 0.002 -2.493 0.013 -0.007 -0.001
Engine 9.227e-06 2.9e-05 0.318 0.750 -4.76e-05 6.6e-05
Power 0.0014 0.000 4.087 0.000 0.001 0.002
Seats 0.0118 0.019 0.608 0.543 -0.026 0.050
kilometers_driven_log -0.0760 0.005 -14.577 0.000 -0.086 -0.066
Location_Bangalore 0.1733 0.017 10.163 0.000 0.140 0.207
Location_Chennai 0.0485 0.016 2.987 0.003 0.017 0.080
Location_Coimbatore 0.1419 0.015 9.177 0.000 0.112 0.172
Location_Delhi -0.0932 0.016 -5.950 0.000 -0.124 -0.062
Location_Hyderabad 0.1470 0.015 9.765 0.000 0.117 0.177
Location_Jaipur -0.0289 0.017 -1.745 0.081 -0.061 0.004
Location_Kochi -0.0098 0.015 -0.630 0.529 -0.040 0.021
Location_Kolkata -0.2266 0.016 -14.190 0.000 -0.258 -0.195
Location_Mumbai -0.0768 0.015 -5.106 0.000 -0.106 -0.047
Location_Pune -0.0337 0.016 -2.151 0.032 -0.064 -0.003
Fuel_Type_Diesel 0.0166 0.031 0.530 0.596 -0.045 0.078
Fuel_Type_LPG -0.0655 0.076 -0.859 0.390 -0.215 0.084
Fuel_Type_Petrol -0.0912 0.032 -2.852 0.004 -0.154 -0.028
Transmission_Manual -0.0960 0.010 -9.375 0.000 -0.116 -0.076
Owner_Type_Fourth & Above -0.0864 0.074 -1.172 0.241 -0.231 0.058
Owner_Type_Second -0.0514 0.008 -6.637 0.000 -0.067 -0.036
Owner_Type_Third -0.1223 0.021 -5.920 0.000 -0.163 -0.082
Brand_Audi -190.4469 3.009 -63.282 0.000 -196.347 -184.546
Brand_BMW -187.0584 2.958 -63.244 0.000 -192.857 -181.259
Brand_Bentley -97.9801 1.556 -62.987 0.000 -101.030 -94.930
Brand_Chevrolet -189.1596 2.963 -63.834 0.000 -194.969 -183.350
Brand_Datsun -170.6479 2.668 -63.953 0.000 -175.879 -165.416
Brand_Fiat -183.4456 2.871 -63.896 0.000 -189.075 -177.817
Brand_Force -98.8717 1.560 -63.395 0.000 -101.930 -95.814
Brand_Ford -192.2491 3.017 -63.715 0.000 -198.165 -186.333
Brand_Honda -194.2222 3.048 -63.726 0.000 -200.198 -188.247
Brand_Hyundai -194.8630 3.054 -63.807 0.000 -200.851 -188.875
Brand_Isuzu -99.0671 1.560 -63.498 0.000 -102.126 -96.008
Brand_Jaguar -176.9958 2.802 -63.174 0.000 -182.489 -171.503
Brand_Jeep -98.8187 1.559 -63.395 0.000 -101.875 -95.762
Brand_Lamborghini -97.7633 1.557 -62.777 0.000 -100.817 -94.710
Brand_Land Rover -147.6478 2.338 -63.164 0.000 -152.231 -143.065
Brand_Mahindra -193.8566 3.047 -63.621 0.000 -199.831 -187.882
Brand_Maruti -198.5188 3.096 -64.122 0.000 -204.589 -192.449
Brand_Mercedes-Benz -191.4421 3.026 -63.258 0.000 -197.376 -185.509
Brand_Mini Cooper -172.4000 2.723 -63.312 0.000 -177.739 -167.061
Brand_Mitsubishi -173.0238 2.722 -63.565 0.000 -178.361 -167.687
Brand_Nissan -185.1204 2.905 -63.729 0.000 -190.816 -179.425
Brand_Porsche -175.0380 2.768 -63.240 0.000 -180.465 -169.611
Brand_Renault -188.5606 2.957 -63.770 0.000 -194.358 -182.763
Brand_Skoda -192.2906 3.018 -63.721 0.000 -198.207 -186.374
Brand_Smart -99.1499 1.556 -63.725 0.000 -102.200 -96.099
Brand_Tata -194.0216 3.035 -63.925 0.000 -199.972 -188.071
Brand_Toyota -191.5153 3.012 -63.582 0.000 -197.421 -185.610
Brand_Volkswagen -190.8784 2.997 -63.682 0.000 -196.755 -185.002
Brand_Volvo -175.3870 2.768 -63.362 0.000 -180.814 -169.960
Model_1Series -10.4063 0.188 -55.323 0.000 -10.775 -10.037
Model_3Series -10.2546 0.155 -66.004 0.000 -10.559 -9.950
Model_5Series -9.9958 0.157 -63.744 0.000 -10.303 -9.688
Model_6Series -9.3836 0.174 -53.847 0.000 -9.725 -9.042
Model_7Series -9.5631 0.166 -57.505 0.000 -9.889 -9.237
Model_800AC -0.7719 0.177 -4.371 0.000 -1.118 -0.426
Model_800DX 1.018e-11 8.9e-13 11.441 0.000 8.44e-12 1.19e-11
Model_800Std -0.5967 0.193 -3.097 0.002 -0.974 -0.219
Model_A-StarAT -0.2263 0.192 -1.178 0.239 -0.603 0.150
Model_A-StarLxi -0.1284 0.181 -0.710 0.478 -0.483 0.226
Model_A-StarVxi -0.0980 0.172 -0.571 0.568 -0.435 0.239
Model_A335 -6.9104 0.132 -52.516 0.000 -7.168 -6.652
Model_A41.8 -6.8775 0.149 -46.248 0.000 -7.169 -6.586
Model_A42.0 -6.8241 0.107 -63.648 0.000 -7.034 -6.614
Model_A43.0 -7.0016 0.135 -51.879 0.000 -7.266 -6.737
Model_A43.2 8.799e-12 1.29e-12 6.820 0.000 6.27e-12 1.13e-11
Model_A430 -6.8209 0.186 -36.671 0.000 -7.186 -6.456
Model_A435 -6.8321 0.123 -55.571 0.000 -7.073 -6.591
Model_A4New -6.9809 0.136 -51.180 0.000 -7.248 -6.713
Model_A62.0 -6.2528 0.185 -33.880 0.000 -6.615 -5.891
Model_A62.7 -6.8744 0.129 -53.346 0.000 -7.127 -6.622
Model_A62.8 -6.7536 0.182 -37.179 0.000 -7.110 -6.397
Model_A62011-2015 -6.6467 0.111 -59.636 0.000 -6.865 -6.428
Model_A63.0 -6.9279 0.134 -51.887 0.000 -7.190 -6.666
Model_A635 -6.4954 0.127 -51.293 0.000 -6.744 -6.247
Model_A72011-2015 -6.1878 0.188 -32.959 0.000 -6.556 -5.820
Model_A8L -5.7975 0.184 -31.523 0.000 -6.158 -5.437
Model_AClass -5.9945 0.114 -52.575 0.000 -6.218 -5.771
Model_AccentCRDi -3.9628 0.120 -32.909 0.000 -4.199 -3.727
Model_AccentExecutive 3.258e-11 3.66e-12 8.890 0.000 2.54e-11 3.98e-11
Model_AccentGLE -3.9540 0.071 -55.455 0.000 -4.094 -3.814
Model_AccentGLS -3.8569 0.103 -37.375 0.000 -4.059 -3.655
Model_Accord2.4 -3.7927 0.078 -48.905 0.000 -3.945 -3.641
Model_Accord2001-2003 -3.6696 0.127 -28.955 0.000 -3.918 -3.421
Model_AccordV6 -4.1531 0.168 -24.789 0.000 -4.482 -3.825
Model_AccordVTi-L -4.0594 0.128 -31.711 0.000 -4.310 -3.808
Model_Alto800 -0.4585 0.160 -2.862 0.004 -0.773 -0.144
Model_AltoGreen -0.4692 0.223 -2.102 0.036 -0.907 -0.031
Model_AltoK10 -0.3591 0.160 -2.247 0.025 -0.673 -0.046
Model_AltoLX -0.6173 0.192 -3.215 0.001 -0.994 -0.241
Model_AltoLXI -3.96e-12 8.81e-13 -4.493 0.000 -5.69e-12 -2.23e-12
Model_AltoLXi -0.2642 0.160 -1.648 0.099 -0.579 0.050
Model_AltoStd -0.2983 0.191 -1.558 0.119 -0.674 0.077
Model_AltoVXi -5.404e-12 6.62e-13 -8.165 0.000 -6.7e-12 -4.11e-12
Model_AltoVxi 0.1386 0.221 0.626 0.531 -0.295 0.572
Model_AmazeE -4.2098 0.085 -49.309 0.000 -4.377 -4.042
Model_AmazeEX -4.3100 0.110 -39.351 0.000 -4.525 -4.095
Model_AmazeS -4.2362 0.071 -59.441 0.000 -4.376 -4.096
Model_AmazeSX -4.2757 0.101 -42.256 0.000 -4.474 -4.077
Model_AmazeV -4.2274 0.127 -33.191 0.000 -4.477 -3.978
Model_AmazeVX -4.1739 0.077 -54.159 0.000 -4.325 -4.023
Model_Ameo1.2 -7.6078 0.128 -59.497 0.000 -7.859 -7.357
Model_Ameo1.5 -7.5577 0.147 -51.337 0.000 -7.846 -7.269
Model_AspireAmbiente -6.2611 0.181 -34.533 0.000 -6.617 -5.906
Model_AspireTitanium -5.9793 0.130 -45.825 0.000 -6.235 -5.723
Model_Aveo1.4 -9.5678 0.170 -56.322 0.000 -9.901 -9.235
Model_Aveo1.6 -9.7178 0.209 -46.450 0.000 -10.128 -9.308
Model_AveoU-VA -9.7213 0.160 -60.940 0.000 -10.034 -9.408
Model_AvventuraMULTIJET -14.9094 0.259 -57.580 0.000 -15.417 -14.402
Model_BClass -6.0582 0.105 -57.530 0.000 -6.265 -5.852
Model_BR-Vi-DTEC 6.159e-13 8.54e-13 0.721 0.471 -1.06e-12 2.29e-12
Model_BR-Vi-VTEC -4.0353 0.134 -30.145 0.000 -4.298 -3.773
Model_BRVi-VTEC -3.9152 0.104 -37.533 0.000 -4.120 -3.711
Model_BalenoAlpha 0.2921 0.163 1.793 0.073 -0.027 0.612
Model_BalenoDelta 0.1067 0.166 0.644 0.520 -0.218 0.431
Model_BalenoLXI -0.4686 0.192 -2.438 0.015 -0.845 -0.092
Model_BalenoRS 0.2623 0.176 1.490 0.136 -0.083 0.607
Model_BalenoSigma 0.0793 0.182 0.436 0.663 -0.277 0.436
Model_BalenoVxi -0.3108 0.192 -1.616 0.106 -0.688 0.066
Model_BalenoZeta 0.1698 0.164 1.035 0.301 -0.152 0.492
Model_BeatDiesel -9.7274 0.154 -62.962 0.000 -10.030 -9.425
Model_BeatLS -9.6412 0.159 -60.686 0.000 -9.953 -9.330
Model_BeatLT -9.6850 0.155 -62.370 0.000 -9.989 -9.381
Model_BeatOption 2.326e-12 7.66e-13 3.038 0.002 8.25e-13 3.83e-12
Model_Beetle2.0 1.4e-12 8e-13 1.749 0.080 -1.69e-13 2.97e-12
Model_BoleroDI -4.3241 0.168 -25.667 0.000 -4.654 -3.994
Model_BoleroSLE -4.6119 0.170 -27.160 0.000 -4.945 -4.279
Model_BoleroSLX -4.3986 0.169 -25.971 0.000 -4.731 -4.067
Model_BoleroVLX -4.3062 0.170 -25.279 0.000 -4.640 -3.972
Model_BoleroZLX -4.3942 0.098 -44.822 0.000 -4.586 -4.202
Model_BoleromHAWK 3.808e-12 7.56e-13 5.040 0.000 2.33e-12 5.29e-12
Model_BoltQuadrajet -4.6461 0.136 -34.071 0.000 -4.913 -4.379
Model_BoltRevotron -4.8586 0.172 -28.265 0.000 -5.196 -4.522
Model_BoxsterS -1.907e-12 7.73e-13 -2.468 0.014 -3.42e-12 -3.92e-13
Model_Brio1.2 -4.3234 0.101 -42.859 0.000 -4.521 -4.126
Model_BrioE -4.3443 0.166 -26.130 0.000 -4.670 -4.018
Model_BrioEX -4.4702 0.167 -26.704 0.000 -4.798 -4.142
Model_BrioS -4.3564 0.071 -60.978 0.000 -4.496 -4.216
Model_BrioV -4.3053 0.090 -48.050 0.000 -4.481 -4.130
Model_BrioVX -4.3074 0.088 -48.723 0.000 -4.481 -4.134
Model_C-ClassProgressive -5.8321 0.148 -39.441 0.000 -6.122 -5.542
Model_CLA200 -5.8348 0.104 -55.887 0.000 -6.039 -5.630
Model_CLS-Class2006-2010 -5.2682 0.175 -30.183 0.000 -5.610 -4.926
Model_CR-V2.0 2.984e-12 8.38e-13 3.559 0.000 1.34e-12 4.63e-12
Model_CR-V2.0L -3.4664 0.095 -36.526 0.000 -3.652 -3.280
Model_CR-V2.4 -3.6665 0.090 -40.900 0.000 -3.842 -3.491
Model_CR-V2.4L -3.3975 0.101 -33.706 0.000 -3.595 -3.200
Model_CR-VAT -5.736e-13 8.67e-13 -0.662 0.508 -2.27e-12 1.13e-12
Model_CR-VPetrol -2.473e-12 6.82e-13 -3.626 0.000 -3.81e-12 -1.14e-12
Model_CR-VRVi -3.3289 0.126 -26.476 0.000 -3.575 -3.082
Model_CR-VSport -3.1921 0.168 -19.021 0.000 -3.521 -2.863
Model_Camry2.5 -5.5548 0.186 -29.843 0.000 -5.920 -5.190
Model_CamryA/T -8.829e-13 5.96e-13 -1.482 0.138 -2.05e-12 2.85e-13
Model_CamryHybrid -5.5055 0.138 -40.038 0.000 -5.775 -5.236
Model_CamryW2 -6.5387 0.182 -35.919 0.000 -6.896 -6.182
Model_CamryW4 -6.6148 0.182 -36.353 0.000 -6.972 -6.258
Model_CaptivaLT -2.426e-13 8.2e-13 -0.296 0.767 -1.85e-12 1.37e-12
Model_CaptivaLTZ -9.1088 0.212 -42.900 0.000 -9.525 -8.693
Model_Captur1.5 -9.4996 0.189 -50.314 0.000 -9.870 -9.129
Model_Cayenne2009-2014 -21.8549 0.362 -60.438 0.000 -22.564 -21.146
Model_CayenneBase -25.2074 0.380 -66.343 0.000 -25.952 -24.462
Model_CayenneDiesel -21.4012 0.377 -56.837 0.000 -22.139 -20.663
Model_CayenneS -21.6049 0.380 -56.876 0.000 -22.350 -20.860
Model_CayenneTurbo -21.6669 0.378 -57.262 0.000 -22.409 -20.925
Model_Cayman2009-2012 -21.0656 0.376 -56.016 0.000 -21.803 -20.328
Model_CediaSports 1.898e-12 8.27e-13 2.296 0.022 2.77e-13 3.52e-12
Model_CelerioCNG -0.0125 0.224 -0.056 0.955 -0.451 0.426
Model_CelerioLDi -0.4440 0.222 -2.001 0.046 -0.879 -0.009
Model_CelerioLXI -0.1513 0.182 -0.831 0.406 -0.508 0.205
Model_CelerioVXI -0.1642 0.161 -1.020 0.308 -0.480 0.151
Model_CelerioZDi -0.1584 0.222 -0.713 0.476 -0.594 0.277
Model_CelerioZXI -0.0793 0.164 -0.484 0.629 -0.401 0.242
Model_Ciaz1.3 0.3287 0.173 1.901 0.057 -0.010 0.668
Model_Ciaz1.4 0.4380 0.177 2.481 0.013 0.092 0.784
Model_CiazAT 0.3220 0.193 1.670 0.095 -0.056 0.700
Model_CiazAlpha 3.153e-12 9.59e-13 3.287 0.001 1.27e-12 5.03e-12
Model_CiazRS 0.5411 0.222 2.436 0.015 0.106 0.977
Model_CiazVDI 0.2534 0.182 1.393 0.164 -0.103 0.610
Model_CiazVDi 0.3388 0.169 2.007 0.045 0.008 0.670
Model_CiazVXi 0.3646 0.176 2.066 0.039 0.019 0.711
Model_CiazZDi 0.4153 0.164 2.534 0.011 0.094 0.737
Model_CiazZXi 0.4435 0.170 2.602 0.009 0.109 0.778
Model_CiazZeta 0.4020 0.193 2.083 0.037 0.024 0.780
Model_City1.3 -4.1017 0.095 -43.123 0.000 -4.288 -3.915
Model_City1.5 -4.0157 0.065 -61.754 0.000 -4.143 -3.888
Model_CityCorporate -3.9912 0.166 -24.090 0.000 -4.316 -3.666
Model_CityV -3.9466 0.076 -52.148 0.000 -4.095 -3.798
Model_CityZX -4.1572 0.073 -56.656 0.000 -4.301 -4.013
Model_Cityi -3.8646 0.069 -55.892 0.000 -4.000 -3.729
Model_Cityi-DTEC -3.6114 0.128 -28.140 0.000 -3.863 -3.360
Model_Cityi-VTEC -3.8434 0.078 -49.402 0.000 -3.996 -3.691
Model_Civic2006-2010 -4.0615 0.073 -55.839 0.000 -4.204 -3.919
Model_Civic2010-2013 -4.1479 0.086 -47.979 0.000 -4.317 -3.978
Model_Classic1.4 2.994e-12 6.37e-13 4.702 0.000 1.75e-12 4.24e-12
Model_ClassicNova -198.4473 3.106 -63.893 0.000 -204.537 -192.358
Model_ClubmanCooper -24.5038 0.417 -58.772 0.000 -25.321 -23.686
Model_Compass1.4 -1.347e-12 7.93e-13 -1.698 0.090 -2.9e-12 2.09e-13
Model_Compass2.0 -98.8187 1.559 -63.395 0.000 -101.875 -95.762
Model_ContinentalFlying -97.9801 1.556 -62.987 0.000 -101.030 -94.930
Model_Cooper3 -24.6235 0.395 -62.333 0.000 -25.398 -23.849
Model_Cooper5 -24.7427 0.397 -62.305 0.000 -25.521 -23.964
Model_CooperConvertible -24.4751 0.396 -61.837 0.000 -25.251 -23.699
Model_CooperCountryman -24.7224 0.401 -61.583 0.000 -25.509 -23.935
Model_CooperS -24.3721 0.400 -60.933 0.000 -25.156 -23.588
Model_Corolla1.8 -6.1820 0.184 -33.655 0.000 -6.542 -5.822
Model_CorollaAltis -6.3750 0.106 -59.873 0.000 -6.584 -6.166
Model_CorollaDX -6.7961 0.177 -38.370 0.000 -7.143 -6.449
Model_CorollaExecutive -6.6601 0.180 -37.061 0.000 -7.012 -6.308
Model_CorollaH2 -7.0250 0.180 -39.073 0.000 -7.378 -6.673
Model_CorollaH4 -6.7367 0.122 -55.083 0.000 -6.976 -6.497
Model_CorollaH5 -6.8531 0.145 -47.104 0.000 -7.138 -6.568
Model_CountrymanCooper -24.9604 0.417 -59.853 0.000 -25.778 -24.143
Model_Creta1.4 -3.0382 0.088 -34.400 0.000 -3.211 -2.865
Model_Creta1.6 -3.0055 0.066 -45.418 0.000 -3.135 -2.876
Model_CrossPolo1.5 -7.5774 0.158 -47.815 0.000 -7.888 -7.267
Model_CruzeLTZ -9.0739 0.161 -56.266 0.000 -9.390 -8.758
Model_D-MAXV-Cross -99.0671 1.560 -63.498 0.000 -102.126 -96.008
Model_Duster110PS -9.5441 0.161 -59.357 0.000 -9.859 -9.229
Model_Duster85PS -9.6185 0.161 -59.782 0.000 -9.934 -9.303
Model_DusterAdventure -9.6232 0.217 -44.358 0.000 -10.049 -9.198
Model_DusterPetrol -9.8249 0.219 -44.916 0.000 -10.254 -9.396
Model_DusterRXZ 7.797e-13 7.49e-13 1.041 0.298 -6.89e-13 2.25e-12
Model_DzireAMT 0.1008 0.177 0.570 0.569 -0.246 0.448
Model_DzireLDI 0.0997 0.223 0.448 0.654 -0.337 0.536
Model_DzireNew 0.3045 0.222 1.369 0.171 -0.132 0.741
Model_DzireVDI 0.2813 0.176 1.595 0.111 -0.065 0.627
Model_DzireVXI 0.2521 0.181 1.390 0.165 -0.104 0.608
Model_DzireZDI 0.2884 0.193 1.494 0.135 -0.090 0.667
Model_E-Class200 3.363e-12 1.14e-12 2.940 0.003 1.12e-12 5.61e-12
Model_E-Class2009-2013 -5.6691 0.094 -60.194 0.000 -5.854 -5.484
Model_E-Class2015-2017 -5.5356 0.103 -53.498 0.000 -5.738 -5.333
Model_E-Class220 9.172e-13 8.03e-13 1.143 0.253 -6.57e-13 2.49e-12
Model_E-Class230 -5.9963 0.124 -48.173 0.000 -6.240 -5.752
Model_E-Class250 -5.7455 0.167 -34.356 0.000 -6.073 -5.418
Model_E-Class280 -5.9938 0.109 -55.188 0.000 -6.207 -5.781
Model_E-ClassE -5.2590 0.141 -37.223 0.000 -5.536 -4.982
Model_E-ClassE250 -5.6625 0.111 -51.194 0.000 -5.879 -5.446
Model_E-ClassE270 -5.8159 0.171 -33.989 0.000 -6.151 -5.480
Model_E-ClassE350 -5.4948 0.177 -31.026 0.000 -5.842 -5.148
Model_E-ClassE400 -5.0602 0.178 -28.381 0.000 -5.410 -4.711
Model_E-ClassFacelift -5.5342 0.179 -30.920 0.000 -5.885 -5.183
Model_EON1.0 -1.499e-12 8.93e-13 -1.680 0.093 -3.25e-12 2.51e-13
Model_EOND -3.9973 0.075 -53.633 0.000 -4.143 -3.851
Model_EONEra -4.0156 0.073 -55.176 0.000 -4.158 -3.873
Model_EONLPG -4.633e-13 7.84e-13 -0.591 0.555 -2e-12 1.07e-12
Model_EONMagna -4.0661 0.079 -51.394 0.000 -4.221 -3.911
Model_EONSportz -4.0561 0.108 -37.581 0.000 -4.268 -3.844
Model_EcoSport1.0 -5.8898 0.133 -44.183 0.000 -6.151 -5.628
Model_EcoSport1.5 -5.9411 0.103 -57.553 0.000 -6.143 -5.739
Model_Ecosport1.0 -5.8605 0.180 -32.604 0.000 -6.213 -5.508
Model_Ecosport1.5 -5.8902 0.101 -58.596 0.000 -6.087 -5.693
Model_EcosportSignature -5.9285 0.147 -40.462 0.000 -6.216 -5.641
Model_Eeco5 -0.3819 0.178 -2.144 0.032 -0.731 -0.033
Model_Eeco7 -0.3282 0.172 -1.909 0.056 -0.665 0.009
Model_EecoCNG 1.696e-12 8.65e-13 1.960 0.050 -1.23e-16 3.39e-12
Model_EecoSmiles -7.436e-14 6.05e-13 -0.123 0.902 -1.26e-12 1.11e-12
Model_Elantra1.6 -3.0195 0.124 -24.297 0.000 -3.263 -2.776
Model_Elantra2.0 -2.7589 0.126 -21.923 0.000 -3.006 -2.512
Model_ElantraCRDi -3.0443 0.076 -40.255 0.000 -3.193 -2.896
Model_ElantraSX -2.9220 0.165 -17.670 0.000 -3.246 -2.598
Model_Elitei20 -3.4032 0.078 -43.355 0.000 -3.557 -3.249
Model_Endeavour2.2 -5.0802 0.134 -37.931 0.000 -5.343 -4.818
Model_Endeavour2.5L -5.5773 0.137 -40.693 0.000 -5.846 -5.309
Model_Endeavour3.0L -5.6725 0.128 -44.488 0.000 -5.922 -5.422
Model_Endeavour3.2 -5.0355 0.123 -40.974 0.000 -5.276 -4.794
Model_Endeavour4x2 -5.5870 0.154 -36.214 0.000 -5.890 -5.285
Model_EndeavourHurricane -5.7893 0.153 -37.919 0.000 -6.089 -5.490
Model_EndeavourTitanium -7.985e-14 4.78e-13 -0.167 0.867 -1.02e-12 8.58e-13
Model_EndeavourXLT -5.4897 0.149 -36.882 0.000 -5.782 -5.198
Model_Enjoy1.3 -9.3697 0.192 -48.893 0.000 -9.745 -8.994
Model_Enjoy1.4 -9.3457 0.218 -42.903 0.000 -9.773 -8.919
Model_EnjoyPetrol -9.4752 0.214 -44.216 0.000 -9.895 -9.055
Model_EnjoyTCDi -9.4791 0.180 -52.708 0.000 -9.832 -9.127
Model_ErtigaLXI 0.4891 0.225 2.172 0.030 0.048 0.931
Model_ErtigaPaseo 0.2952 0.226 1.307 0.191 -0.148 0.738
Model_ErtigaSHVS 0.3766 0.173 2.174 0.030 0.037 0.716
Model_ErtigaVDI 0.4251 0.167 2.546 0.011 0.098 0.753
Model_ErtigaVXI 0.4107 0.172 2.390 0.017 0.074 0.747
Model_ErtigaZDI 0.4537 0.167 2.715 0.007 0.126 0.781
Model_ErtigaZXI 0.4604 0.186 2.479 0.013 0.096 0.825
Model_EsteemLX -0.4936 0.222 -2.227 0.026 -0.928 -0.059
Model_EsteemVxi -0.4946 0.175 -2.828 0.005 -0.838 -0.152
Model_EstiloLXI -0.1249 0.181 -0.691 0.489 -0.479 0.229
Model_Etios1.4 -1.591e-12 7.08e-13 -2.246 0.025 -2.98e-12 -2.02e-13
Model_EtiosCross -6.8298 0.130 -52.610 0.000 -7.084 -6.575
Model_EtiosG -6.9041 0.124 -55.793 0.000 -7.147 -6.662
Model_EtiosGD -6.8827 0.124 -55.525 0.000 -7.126 -6.640
Model_EtiosLiva -7.0123 0.111 -63.387 0.000 -7.229 -6.795
Model_EtiosPetrol -6.8233 0.185 -36.954 0.000 -7.185 -6.461
Model_EtiosV -6.8643 0.184 -37.367 0.000 -7.225 -6.504
Model_EtiosVD -6.5880 0.135 -48.863 0.000 -6.852 -6.324
Model_EtiosVX -6.8650 0.149 -45.942 0.000 -7.158 -6.572
Model_EtiosVXD -2.356e-14 9.31e-13 -0.025 0.980 -1.85e-12 1.8e-12
Model_Evalia2013 -13.4928 0.261 -51.715 0.000 -14.004 -12.981
Model_FType -19.3359 0.349 -55.456 0.000 -20.020 -18.652
Model_Fabia1.2 -6.4629 0.115 -56.273 0.000 -6.688 -6.238
Model_Fabia1.2L -6.3157 0.177 -35.664 0.000 -6.663 -5.969
Model_Fabia1.4 -6.0552 0.177 -34.241 0.000 -6.402 -5.708
Model_Fabia1.6 -6.5606 0.179 -36.584 0.000 -6.912 -6.209
Model_Fiesta1.4 -6.3168 0.101 -62.706 0.000 -6.514 -6.119
Model_Fiesta1.5 -6.0775 0.180 -33.728 0.000 -6.431 -5.724
Model_Fiesta1.6 -6.4219 0.119 -53.896 0.000 -6.655 -6.188
Model_FiestaClassic -6.4737 0.118 -54.844 0.000 -6.705 -6.242
Model_FiestaDiesel -5.7724 0.178 -32.492 0.000 -6.121 -5.424
Model_FiestaEXi -6.3823 0.143 -44.766 0.000 -6.662 -6.103
Model_FiestaTitanium 3.366e-13 7.26e-13 0.464 0.643 -1.09e-12 1.76e-12
Model_Figo1.2P 3.353e-13 6e-13 0.558 0.577 -8.42e-13 1.51e-12
Model_Figo1.5D -6.0593 0.141 -42.902 0.000 -6.336 -5.782
Model_Figo2015-2019 -6.3651 0.113 -56.397 0.000 -6.586 -6.144
Model_FigoAspire -6.2831 0.113 -55.662 0.000 -6.504 -6.062
Model_FigoDiesel -6.4529 0.100 -64.730 0.000 -6.648 -6.257
Model_FigoPetrol -6.3359 0.105 -60.412 0.000 -6.541 -6.130
Model_FigoTitanium -6.6700 0.181 -36.926 0.000 -7.024 -6.316
Model_Fluence1.5 -9.9159 0.215 -46.065 0.000 -10.338 -9.494
Model_Fluence2.0 -9.6142 0.215 -44.796 0.000 -10.035 -9.193
Model_FluenceDiesel -9.7868 0.187 -52.206 0.000 -10.154 -9.419
Model_Fortuner2.8 -5.7408 0.124 -46.118 0.000 -5.985 -5.497
Model_Fortuner3.0 -5.8331 0.116 -50.390 0.000 -6.060 -5.606
Model_Fortuner4x2 -5.7433 0.116 -49.371 0.000 -5.971 -5.515
Model_Fortuner4x4 -5.7713 0.127 -45.605 0.000 -6.019 -5.523
Model_FortunerTRD -5.8636 0.186 -31.511 0.000 -6.228 -5.499
Model_FortwoCDI -99.1499 1.556 -63.725 0.000 -102.200 -96.099
Model_FreestyleTitanium -5.6613 0.140 -40.353 0.000 -5.936 -5.386
Model_FusionPlus -6.0965 0.177 -34.478 0.000 -6.443 -5.750
Model_GL-Class2007 -5.1038 0.118 -43.332 0.000 -5.335 -4.873
Model_GL-Class350 -5.1340 0.149 -34.357 0.000 -5.427 -4.841
Model_GLAClass -5.7046 0.104 -54.912 0.000 -5.908 -5.501
Model_GLC220 -5.4271 0.141 -38.385 0.000 -5.704 -5.150
Model_GLC220d -5.3741 0.141 -38.093 0.000 -5.651 -5.098
Model_GLC43 -5.3052 0.182 -29.170 0.000 -5.662 -4.949
Model_GLE250d -5.1583 0.119 -43.309 0.000 -5.392 -4.925
Model_GLE350d -5.2360 0.113 -46.495 0.000 -5.457 -5.015
Model_GLS350d -5.0848 0.151 -33.597 0.000 -5.382 -4.788
Model_GONXT -28.3479 0.464 -61.029 0.000 -29.259 -27.437
Model_GOPlus -28.2127 0.461 -61.159 0.000 -29.117 -27.308
Model_GOT -28.5319 0.466 -61.167 0.000 -29.446 -27.617
Model_GallardoCoupe -97.7633 1.557 -62.777 0.000 -100.817 -94.710
Model_Getz1.3 4.582e-13 7.91e-13 0.579 0.562 -1.09e-12 2.01e-12
Model_Getz1.5 -3.9572 0.163 -24.223 0.000 -4.278 -3.637
Model_GetzGLE -3.7616 0.102 -36.991 0.000 -3.961 -3.562
Model_GetzGLS -4.2214 0.092 -45.985 0.000 -4.401 -4.041
Model_GetzGVS -3.3482 0.162 -20.704 0.000 -3.665 -3.031
Model_GrandVitara 0.5830 0.197 2.953 0.003 0.196 0.970
Model_GrandePunto -15.2656 0.257 -59.464 0.000 -15.769 -14.762
Model_Grandi10 -3.6844 0.063 -58.926 0.000 -3.807 -3.562
Model_HexaXT -3.7729 0.143 -26.309 0.000 -4.054 -3.492
Model_HexaXTA -3.7857 0.178 -21.261 0.000 -4.135 -3.437
Model_Ignis1.2 -0.2619 0.193 -1.358 0.175 -0.640 0.116
Model_Ignis1.3 0.0508 0.222 0.228 0.819 -0.385 0.487
Model_Ikon1.3 -6.4491 0.110 -58.648 0.000 -6.665 -6.234
Model_Ikon1.4 -7.2269 0.179 -40.318 0.000 -7.578 -6.875
Model_Ikon1.6 -6.6108 0.173 -38.118 0.000 -6.951 -6.271
Model_IndicaDLS -5.3396 0.116 -46.008 0.000 -5.567 -5.112
Model_IndicaGLS -5.0438 0.169 -29.839 0.000 -5.375 -4.712
Model_IndicaLEI -4.9374 0.168 -29.340 0.000 -5.267 -4.607
Model_IndicaV2 -5.2222 0.092 -57.000 0.000 -5.402 -5.043
Model_IndicaVista -4.9346 0.092 -53.536 0.000 -5.115 -4.754
Model_IndigoCS -4.9044 0.097 -50.510 0.000 -5.095 -4.714
Model_IndigoGLE -5.1794 0.133 -38.836 0.000 -5.441 -4.918
Model_IndigoLS -4.9978 0.106 -47.154 0.000 -5.206 -4.790
Model_IndigoLX -5.0369 0.109 -46.385 0.000 -5.250 -4.824
Model_IndigoXL -2.02e-12 9.41e-13 -2.147 0.032 -3.86e-12 -1.75e-13
Model_IndigoeCS -4.9951 0.105 -47.763 0.000 -5.200 -4.790
Model_Innova2.0 -6.0566 0.134 -45.055 0.000 -6.320 -5.793
Model_Innova2.5 -6.1405 0.114 -53.762 0.000 -6.364 -5.917
Model_InnovaCrysta -6.1272 0.118 -51.854 0.000 -6.359 -5.896
Model_Jazz1.2 -4.1526 0.077 -54.278 0.000 -4.303 -4.003
Model_Jazz1.5 -4.0671 0.085 -48.050 0.000 -4.233 -3.901
Model_JazzActive -4.3858 0.165 -26.652 0.000 -4.708 -4.063
Model_JazzExclusive -4.1900 0.167 -25.148 0.000 -4.517 -3.863
Model_JazzMode -4.3447 0.165 -26.382 0.000 -4.668 -4.022
Model_JazzS -4.2829 0.124 -34.425 0.000 -4.527 -4.039
Model_JazzSelect -4.2641 0.125 -34.169 0.000 -4.509 -4.019
Model_JazzV -4.0543 0.096 -42.359 0.000 -4.242 -3.867
Model_JazzVX -4.1028 0.102 -40.267 0.000 -4.303 -3.903
Model_JeepMM -4.2671 0.128 -33.320 0.000 -4.518 -4.016
Model_Jetta2007-2011 -7.1305 0.128 -55.491 0.000 -7.382 -6.879
Model_Jetta2012-2014 -6.9819 0.135 -51.810 0.000 -7.246 -6.718
Model_Jetta2013-2015 -6.9901 0.130 -53.585 0.000 -7.246 -6.734
Model_KUV100 -4.8161 0.093 -51.601 0.000 -4.999 -4.633
Model_KWID1.0 -10.3027 0.172 -59.733 0.000 -10.641 -9.964
Model_KWIDAMT -10.5141 0.217 -48.431 0.000 -10.940 -10.088
Model_KWIDClimber -10.3249 0.175 -58.880 0.000 -10.669 -9.981
Model_KWIDRXL -10.6211 0.188 -56.634 0.000 -10.989 -10.253
Model_KWIDRXT -10.3963 0.161 -64.647 0.000 -10.712 -10.081
Model_Koleos2.0 -9.2702 0.181 -51.192 0.000 -9.625 -8.915
Model_Lancer1.5 -25.4607 0.404 -63.075 0.000 -26.252 -24.669
Model_LancerGLXD -24.6319 0.407 -60.487 0.000 -25.430 -23.833
Model_Laura1.8 -5.9655 0.129 -46.238 0.000 -6.218 -5.713
Model_Laura1.9 -5.7571 0.120 -48.087 0.000 -5.992 -5.522
Model_LauraAmbiente -5.7958 0.115 -50.222 0.000 -6.022 -5.570
Model_LauraAmbition -5.7674 0.128 -44.972 0.000 -6.019 -5.516
Model_LauraClassic -6.0986 0.178 -34.311 0.000 -6.447 -5.750
Model_LauraElegance -5.8082 0.127 -45.618 0.000 -6.058 -5.559
Model_LauraL -6.0911 0.141 -43.049 0.000 -6.369 -5.814
Model_LauraRS -5.6934 0.180 -31.668 0.000 -6.046 -5.341
Model_Linea1.3 -14.9663 0.284 -52.623 0.000 -15.524 -14.409
Model_LineaClassic -15.3956 0.284 -54.134 0.000 -15.953 -14.838
Model_LineaEmotion -15.0820 0.257 -58.573 0.000 -15.587 -14.577
Model_LineaT -15.4314 0.282 -54.735 0.000 -15.984 -14.879
Model_LineaT-Jet -14.9673 0.285 -52.565 0.000 -15.526 -14.409
Model_Lodgy110PS -9.6851 0.201 -48.158 0.000 -10.079 -9.291
Model_LoganDiesel -4.9742 0.173 -28.832 0.000 -5.312 -4.636
Model_LoganPetrol -5.1021 0.170 -29.953 0.000 -5.436 -4.768
Model_M-ClassML -5.4614 0.098 -55.736 0.000 -5.654 -5.269
Model_MUX4WD -3.281e-12 8.43e-13 -3.889 0.000 -4.93e-12 -1.63e-12
Model_ManzaAqua -4.9587 0.116 -42.575 0.000 -5.187 -4.730
Model_ManzaAura -4.9780 0.108 -46.159 0.000 -5.189 -4.767
Model_ManzaClub -4.8096 0.171 -28.121 0.000 -5.145 -4.474
Model_ManzaELAN -4.5108 0.133 -33.832 0.000 -4.772 -4.249
Model_MicraActive -13.5738 0.223 -60.756 0.000 -14.012 -13.136
Model_MicraDiesel -13.4818 0.212 -63.576 0.000 -13.898 -13.066
Model_MicraXE -13.3985 0.252 -53.151 0.000 -13.893 -12.904
Model_MicraXL -13.6417 0.223 -61.171 0.000 -14.079 -13.204
Model_MicraXV -13.3898 0.216 -62.032 0.000 -13.813 -12.967
Model_MobilioE -4.1256 0.171 -24.182 0.000 -4.460 -3.791
Model_MobilioRS -4.0275 0.132 -30.420 0.000 -4.287 -3.768
Model_MobilioS -4.1556 0.103 -40.275 0.000 -4.358 -3.953
Model_MobilioV -3.9936 0.117 -34.095 0.000 -4.223 -3.764
Model_Montero3.2 -24.4339 0.416 -58.706 0.000 -25.250 -23.618
Model_MustangV8 -4.6117 0.196 -23.492 0.000 -4.997 -4.227
Model_NanoCX -5.5041 0.172 -32.001 0.000 -5.841 -5.167
Model_NanoCx -5.7012 0.133 -42.862 0.000 -5.962 -5.440
Model_NanoLX -5.5080 0.132 -41.629 0.000 -5.767 -5.249
Model_NanoLx -5.6141 0.117 -48.139 0.000 -5.843 -5.385
Model_NanoSTD -6.0777 0.173 -35.202 0.000 -6.416 -5.739
Model_NanoTwist -5.4198 0.102 -53.096 0.000 -5.620 -5.220
Model_NanoXT -5.3621 0.134 -39.927 0.000 -5.625 -5.099
Model_NanoXTA -5.1968 0.103 -50.364 0.000 -5.399 -4.994
Model_NewC-Class -5.7923 0.091 -63.940 0.000 -5.970 -5.615
Model_NewSafari -4.4754 0.105 -42.567 0.000 -4.682 -4.269
Model_Nexon1.2 2.531e-12 9.72e-13 2.603 0.009 6.25e-13 4.44e-12
Model_Nexon1.5 -4.1240 0.173 -23.884 0.000 -4.463 -3.785
Model_NuvoSportN6 -4.7146 0.171 -27.545 0.000 -5.050 -4.379
Model_NuvoSportN8 -2.615e-12 8.19e-13 -3.195 0.001 -4.22e-12 -1.01e-12
Model_Octavia1.9 -6.2216 0.177 -35.204 0.000 -6.568 -5.875
Model_Octavia2.0 -5.2702 0.145 -36.472 0.000 -5.554 -4.987
Model_OctaviaAmbiente -6.0928 0.125 -48.593 0.000 -6.339 -5.847
Model_OctaviaAmbition -5.3890 0.124 -43.577 0.000 -5.631 -5.147
Model_OctaviaClassic -6.0666 0.138 -44.022 0.000 -6.337 -5.796
Model_OctaviaElegance -5.3664 0.112 -48.092 0.000 -5.585 -5.148
Model_OctaviaL -6.0409 0.176 -34.270 0.000 -6.387 -5.695
Model_OctaviaRS -6.2112 0.176 -35.205 0.000 -6.557 -5.865
Model_OctaviaRider -6.1500 0.140 -43.943 0.000 -6.424 -5.876
Model_OctaviaStyle -2.409e-12 5.73e-13 -4.206 0.000 -3.53e-12 -1.29e-12
Model_Omni5 -0.4261 0.193 -2.212 0.027 -0.804 -0.048
Model_Omni8 -0.5503 0.173 -3.187 0.001 -0.889 -0.212
Model_OmniE -0.5724 0.186 -3.078 0.002 -0.937 -0.208
Model_OmniMPI -0.5397 0.182 -2.965 0.003 -0.897 -0.183
Model_OneLX -98.8717 1.560 -63.395 0.000 -101.930 -95.814
Model_Optra1.6 -9.3441 0.181 -51.752 0.000 -9.698 -8.990
Model_OptraMagnum -9.4623 0.158 -59.936 0.000 -9.772 -9.153
Model_Outlander2.4 -24.7546 0.402 -61.557 0.000 -25.543 -23.966
Model_Pajero2.8 -24.5173 0.397 -61.800 0.000 -25.295 -23.739
Model_Pajero4X4 -24.7015 0.422 -58.476 0.000 -25.530 -23.873
Model_PajeroSport -24.5237 0.404 -60.764 0.000 -25.315 -23.732
Model_Panamera2010 -21.0899 0.378 -55.822 0.000 -21.831 -20.349
Model_PanameraDiesel -21.1472 0.354 -59.682 0.000 -21.842 -20.453
Model_Passat1.8 -7.2679 0.191 -38.087 0.000 -7.642 -6.894
Model_Passat2.0 1.777e-12 9.31e-13 1.910 0.056 -4.75e-14 3.6e-12
Model_PassatDiesel -6.9680 0.137 -50.840 0.000 -7.237 -6.699
Model_PassatHighline -6.9587 0.190 -36.709 0.000 -7.330 -6.587
Model_Petra1.2 -15.5801 0.275 -56.573 0.000 -16.120 -15.040
Model_PlatinumEtios -1.697e-12 7.22e-13 -2.352 0.019 -3.11e-12 -2.82e-13
Model_Polo1.0 -7.4980 0.193 -38.864 0.000 -7.876 -7.120
Model_Polo1.2 -7.5363 0.121 -62.142 0.000 -7.774 -7.299
Model_Polo1.5 -7.5771 0.124 -61.227 0.000 -7.820 -7.334
Model_PoloDiesel -7.5681 0.120 -63.329 0.000 -7.802 -7.334
Model_PoloGT -7.4551 0.131 -56.957 0.000 -7.712 -7.199
Model_PoloGTI -7.6520 0.160 -47.958 0.000 -7.965 -7.339
Model_PoloIPL -8.871e-13 6.45e-13 -1.375 0.169 -2.15e-12 3.78e-13
Model_PoloPetrol -7.5474 0.119 -63.644 0.000 -7.780 -7.315
Model_PulsePetrol -9.9911 0.218 -45.832 0.000 -10.419 -9.564
Model_PulseRxL -10.0897 0.179 -56.406 0.000 -10.440 -9.739
Model_Punto1.2 -15.4665 0.285 -54.319 0.000 -16.025 -14.908
Model_Punto1.3 -15.2772 0.279 -54.698 0.000 -15.825 -14.730
Model_Punto1.4 -15.4527 0.280 -55.182 0.000 -16.002 -14.904
Model_PuntoEVO 6.954e-13 9.99e-13 0.696 0.486 -1.26e-12 2.65e-12
Model_Q32.0 -6.7812 0.126 -53.993 0.000 -7.027 -6.535
Model_Q32012-2015 -6.7844 0.123 -55.067 0.000 -7.026 -6.543
Model_Q330 -4.369e-12 1.06e-12 -4.103 0.000 -6.46e-12 -2.28e-12
Model_Q335 -6.7832 0.127 -53.427 0.000 -7.032 -6.534
Model_Q52.0 -6.4911 0.118 -55.049 0.000 -6.722 -6.260
Model_Q52008-2012 -6.5431 0.123 -53.283 0.000 -6.784 -6.302
Model_Q53.0 -6.5681 0.152 -43.231 0.000 -6.866 -6.270
Model_Q530 -6.3940 0.124 -51.564 0.000 -6.637 -6.151
Model_Q73.0 -6.4069 0.122 -52.635 0.000 -6.646 -6.168
Model_Q735 -6.3184 0.133 -47.499 0.000 -6.579 -6.058
Model_Q74.2 -6.4492 0.140 -46.133 0.000 -6.723 -6.175
Model_Q745 -6.0710 0.141 -43.205 0.000 -6.346 -5.795
Model_QualisFS -6.1899 0.198 -31.232 0.000 -6.578 -5.801
Model_QualisFleet -6.2484 0.202 -30.868 0.000 -6.645 -5.852
Model_QualisRS -6.1888 0.202 -30.568 0.000 -6.586 -5.792
Model_QuantoC2 -4.9564 0.170 -29.237 0.000 -5.289 -4.624
Model_QuantoC4 -4.7584 0.168 -28.282 0.000 -5.088 -4.429
Model_QuantoC6 -4.7912 0.168 -28.493 0.000 -5.121 -4.461
Model_QuantoC8 -4.7291 0.129 -36.567 0.000 -4.983 -4.476
Model_R-ClassR350 -5.5495 0.133 -41.697 0.000 -5.810 -5.289
Model_RS5Coupe -6.2802 0.164 -38.253 0.000 -6.602 -5.958
Model_Rapid1.5 -5.9734 0.106 -56.206 0.000 -6.182 -5.765
Model_Rapid1.6 -6.0211 0.103 -58.223 0.000 -6.224 -5.818
Model_Rapid2013-2016 9.712e-13 8.39e-13 1.158 0.247 -6.74e-13 2.62e-12
Model_RapidLeisure 1.033e-12 9.85e-13 1.048 0.295 -8.99e-13 2.96e-12
Model_RapidUltima -6.3048 0.180 -35.032 0.000 -6.658 -5.952
Model_RediGO -28.6261 0.466 -61.479 0.000 -29.539 -27.713
Model_RenaultLogan -4.6565 0.167 -27.842 0.000 -4.984 -4.329
Model_RitzAT -2.691e-13 5.82e-13 -0.463 0.644 -1.41e-12 8.71e-13
Model_RitzLDi -0.0513 0.172 -0.297 0.766 -0.389 0.287
Model_RitzLXI 0.0873 0.221 0.394 0.693 -0.347 0.521
Model_RitzLXi -0.0871 0.192 -0.453 0.650 -0.464 0.290
Model_RitzVDI -0.0707 0.222 -0.319 0.750 -0.506 0.364
Model_RitzVDi -0.0407 0.161 -0.252 0.801 -0.357 0.275
Model_RitzVXI -0.0744 0.176 -0.424 0.672 -0.419 0.270
Model_RitzVXi -0.0806 0.170 -0.475 0.635 -0.413 0.252
Model_RitzZDi -0.0922 0.192 -0.479 0.632 -0.469 0.285
Model_RitzZXI 8.212e-13 5.89e-13 1.395 0.163 -3.33e-13 1.98e-12
Model_RitzZXi 0.0438 0.222 0.198 0.843 -0.391 0.478
Model_RoverDiscovery -49.2817 0.784 -62.827 0.000 -50.820 -47.744
Model_RoverFreelander -49.4741 0.779 -63.538 0.000 -51.001 -47.947
Model_RoverRange -48.8920 0.778 -62.862 0.000 -50.417 -47.367
Model_S-Class280 -2.875e-12 9.05e-13 -3.176 0.002 -4.65e-12 -1.1e-12
Model_S-Class320 -5.2182 0.172 -30.265 0.000 -5.556 -4.880
Model_S-ClassS -5.3763 0.176 -30.618 0.000 -5.721 -5.032
Model_S-CrossAlpha -8.134e-13 9.46e-13 -0.860 0.390 -2.67e-12 1.04e-12
Model_S-CrossDelta 0.3236 0.222 1.456 0.145 -0.112 0.759
Model_S-CrossZeta 7.269e-13 7.45e-13 0.975 0.330 -7.35e-13 2.19e-12
Model_S60D3 3.334e-13 6.58e-13 0.506 0.613 -9.57e-13 1.62e-12
Model_S60D4 -21.9055 0.362 -60.454 0.000 -22.616 -21.195
Model_S60D5 -21.9008 0.374 -58.601 0.000 -22.634 -21.168
Model_S802006-2013 -22.4719 0.372 -60.402 0.000 -23.201 -21.742
Model_S80D5 -2.341e-12 7.82e-13 -2.993 0.003 -3.88e-12 -8.07e-13
Model_SClass -5.4716 0.102 -53.520 0.000 -5.672 -5.271
Model_SCross 0.4283 0.173 2.479 0.013 0.090 0.767
Model_SL-ClassSL -5.0114 0.190 -26.389 0.000 -5.384 -4.639
Model_SLC43 -5.2052 0.153 -34.066 0.000 -5.505 -4.906
Model_SLK-Class55 -4.8603 0.191 -25.506 0.000 -5.234 -4.487
Model_SLK-ClassSLK -5.2134 0.149 -35.046 0.000 -5.505 -4.922
Model_SX4Green -0.1877 0.224 -0.838 0.402 -0.627 0.252
Model_SX4S 0.4418 0.171 2.591 0.010 0.107 0.776
Model_SX4VDI 0.1712 0.222 0.772 0.440 -0.264 0.606
Model_SX4Vxi 0.0339 0.167 0.203 0.839 -0.293 0.361
Model_SX4ZDI 0.1272 0.176 0.723 0.469 -0.217 0.472
Model_SX4ZXI 0.1013 0.170 0.596 0.551 -0.232 0.435
Model_SX4Zxi 0.0269 0.176 0.153 0.878 -0.317 0.371
Model_SafariDICOR -4.4303 0.176 -25.135 0.000 -4.776 -4.085
Model_SafariStorme -4.0820 0.113 -36.283 0.000 -4.303 -3.861
Model_Sail1.2 -9.2441 0.183 -50.444 0.000 -9.603 -8.885
Model_SailHatchback -9.5374 0.168 -56.691 0.000 -9.867 -9.208
Model_SailLT -9.6807 0.212 -45.563 0.000 -10.097 -9.264
Model_SantaFe -2.8409 0.089 -31.743 0.000 -3.016 -2.665
Model_SantroAT -3.6891 0.164 -22.455 0.000 -4.011 -3.367
Model_SantroD -3.9024 0.161 -24.238 0.000 -4.218 -3.587
Model_SantroDX 4.293e-13 9.89e-13 0.434 0.664 -1.51e-12 2.37e-12
Model_SantroGLS -3.9380 0.095 -41.662 0.000 -4.123 -3.753
Model_SantroGS -3.6835 0.123 -29.847 0.000 -3.925 -3.441
Model_SantroLP -3.5956 0.162 -22.174 0.000 -3.913 -3.278
Model_SantroLS -4.2427 0.164 -25.836 0.000 -4.565 -3.921
Model_SantroXing -3.9233 0.059 -66.173 0.000 -4.039 -3.807
Model_ScalaDiesel -9.7957 0.218 -44.930 0.000 -10.223 -9.368
Model_ScalaRxL -10.1425 0.187 -54.335 0.000 -10.508 -9.776
Model_Scorpio1.99 -4.0953 0.132 -31.114 0.000 -4.353 -3.837
Model_Scorpio2.6 -4.3486 0.098 -44.514 0.000 -4.540 -4.157
Model_Scorpio2009-2014 -4.1909 0.105 -39.730 0.000 -4.398 -3.984
Model_ScorpioDX -4.1549 0.167 -24.817 0.000 -4.483 -3.827
Model_ScorpioLX -4.3753 0.129 -33.908 0.000 -4.628 -4.122
Model_ScorpioS10 -3.9778 0.116 -34.373 0.000 -4.205 -3.751
Model_ScorpioS2 -1.983e-12 7.01e-13 -2.827 0.005 -3.36e-12 -6.08e-13
Model_ScorpioS4 -4.2234 0.131 -32.255 0.000 -4.480 -3.967
Model_ScorpioS6 -3.9975 0.116 -34.378 0.000 -4.226 -3.770
Model_ScorpioS8 -4.0309 0.133 -30.228 0.000 -4.292 -3.769
Model_ScorpioSLE -4.2456 0.099 -42.817 0.000 -4.440 -4.051
Model_ScorpioSLX 7.218e-13 6.18e-13 1.167 0.243 -4.91e-13 1.93e-12
Model_ScorpioVLX -4.1591 0.086 -48.565 0.000 -4.327 -3.991
Model_Siena1.2 -15.6515 0.276 -56.758 0.000 -16.192 -15.111
Model_Sonata2.4 -3.116e-13 9.35e-13 -0.333 0.739 -2.15e-12 1.52e-12
Model_SonataEmbera -3.0664 0.124 -24.821 0.000 -3.309 -2.824
Model_SonataGOLD -3.6291 0.161 -22.552 0.000 -3.945 -3.314
Model_SonataTransform 1.18e-12 5.49e-13 2.151 0.032 1.04e-13 2.26e-12
Model_Spark1.0 -9.6900 0.169 -57.406 0.000 -10.021 -9.359
Model_SsangyongRexton -3.9533 0.091 -43.240 0.000 -4.133 -3.774
Model_SumoDX -4.2895 0.197 -21.759 0.000 -4.676 -3.903
Model_SumoDelux -4.2354 0.175 -24.177 0.000 -4.579 -3.892
Model_SumoEX -4.6838 0.182 -25.702 0.000 -5.041 -4.327
Model_Sunny2011-2014 -13.3117 0.211 -63.134 0.000 -13.725 -12.898
Model_SunnyDiesel -13.1698 0.255 -51.587 0.000 -13.670 -12.669
Model_SunnyXE -4.512e-13 8.05e-13 -0.560 0.575 -2.03e-12 1.13e-12
Model_SunnyXL -12.9906 0.254 -51.168 0.000 -13.488 -12.493
Model_SunnyXV -13.1788 0.231 -57.144 0.000 -13.631 -12.727
Model_Superb1.8 -5.6058 0.122 -45.888 0.000 -5.845 -5.366
Model_Superb2.5 -4.392e-13 5.63e-13 -0.780 0.435 -1.54e-12 6.65e-13
Model_Superb2.8 -5.6891 0.138 -41.244 0.000 -5.960 -5.419
Model_Superb2009-2014 -4.9904 0.182 -27.432 0.000 -5.347 -4.634
Model_Superb3.6 2.376e-12 8.27e-13 2.873 0.004 7.54e-13 4e-12
Model_SuperbAmbition -5.5036 0.178 -30.850 0.000 -5.853 -5.154
Model_SuperbElegance -5.4992 0.102 -53.887 0.000 -5.699 -5.299
Model_SuperbL&K -5.0246 0.148 -33.986 0.000 -5.315 -4.735
Model_SuperbStyle -5.4082 0.124 -43.545 0.000 -5.652 -5.165
Model_Swift1.3 0.1239 0.167 0.741 0.459 -0.204 0.452
Model_SwiftAMT 0.0855 0.193 0.442 0.659 -0.294 0.465
Model_SwiftDDiS 0.1151 0.182 0.633 0.527 -0.242 0.472
Model_SwiftDzire 0.1869 0.158 1.180 0.238 -0.124 0.498
Model_SwiftLDI 0.0474 0.171 0.278 0.781 -0.287 0.382
Model_SwiftLXI 0.2370 0.181 1.309 0.191 -0.118 0.592
Model_SwiftLXi 0.0941 0.221 0.426 0.670 -0.339 0.527
Model_SwiftLdi 0.1085 0.173 0.628 0.530 -0.230 0.447
Model_SwiftLxi -0.1151 0.176 -0.656 0.512 -0.459 0.229
Model_SwiftRS 0.1850 0.222 0.835 0.404 -0.249 0.619
Model_SwiftVDI 0.1551 0.159 0.975 0.330 -0.157 0.467
Model_SwiftVDi -2.104e-12 5.37e-13 -3.918 0.000 -3.16e-12 -1.05e-12
Model_SwiftVVT 0.1283 0.193 0.666 0.506 -0.249 0.506
Model_SwiftVXI 0.0927 0.161 0.577 0.564 -0.223 0.408
Model_SwiftVXi 0.0176 0.182 0.097 0.923 -0.338 0.374
Model_SwiftVdi 0.1839 0.176 1.047 0.295 -0.160 0.528
Model_SwiftZDI 0.1386 0.222 0.625 0.532 -0.296 0.574
Model_SwiftZDi 0.2342 0.166 1.410 0.159 -0.092 0.560
Model_SwiftZXI 0.2081 0.172 1.208 0.227 -0.130 0.546
Model_TT2.0 -6.4847 0.190 -34.170 0.000 -6.857 -6.113
Model_TT40 -5.9077 0.182 -32.539 0.000 -6.264 -5.552
Model_TUV300 -4.5025 0.098 -46.041 0.000 -4.694 -4.311
Model_TaveraLS -9.3899 0.236 -39.789 0.000 -9.853 -8.927
Model_TaveraLT -8.8982 0.226 -39.349 0.000 -9.342 -8.455
Model_Teana230jM -1.144e-12 1.03e-12 -1.113 0.266 -3.16e-12 8.71e-13
Model_TeanaXV -12.8632 0.262 -49.188 0.000 -13.376 -12.351
Model_TerranoXL -13.0631 0.214 -60.928 0.000 -13.483 -12.643
Model_TerranoXV -13.0035 0.216 -60.286 0.000 -13.426 -12.581
Model_TharCRDe -4.2702 0.116 -36.950 0.000 -4.497 -4.044
Model_TharDI -4.5283 0.172 -26.377 0.000 -4.865 -4.192
Model_Tiago1.2 -4.7574 0.102 -46.745 0.000 -4.957 -4.558
Model_TiagoAMT 1.262e-12 5.72e-13 2.205 0.027 1.4e-13 2.38e-12
Model_TiagoWizz 3.105e-12 6.59e-13 4.714 0.000 1.81e-12 4.4e-12
Model_Tigor1.05 -4.4467 0.173 -25.706 0.000 -4.786 -4.108
Model_Tigor1.2 -4.6836 0.134 -34.861 0.000 -4.947 -4.420
Model_TigorXE -1.755e-12 1.03e-12 -1.698 0.090 -3.78e-12 2.72e-13
Model_Tiguan2.0 -6.3560 0.193 -32.863 0.000 -6.735 -5.977
Model_Tucson2.0 -2.5898 0.166 -15.626 0.000 -2.915 -2.265
Model_TucsonCRDi -2.7089 0.170 -15.939 0.000 -3.042 -2.376
Model_V40Cross -21.8287 0.372 -58.666 0.000 -22.558 -21.099
Model_V40D3 -21.9063 0.364 -60.262 0.000 -22.619 -21.194
Model_Vento1.2 5.416e-13 7.5e-13 0.722 0.470 -9.29e-13 2.01e-12
Model_Vento1.5 -7.3840 0.122 -60.486 0.000 -7.623 -7.145
Model_Vento1.6 -7.4029 0.128 -57.851 0.000 -7.654 -7.152
Model_Vento2013-2015 -7.4946 0.159 -47.015 0.000 -7.807 -7.182
Model_VentoDiesel -7.4081 0.119 -62.234 0.000 -7.641 -7.175
Model_VentoIPL -7.6005 0.156 -48.819 0.000 -7.906 -7.295
Model_VentoKonekt -7.2200 0.190 -37.930 0.000 -7.593 -6.847
Model_VentoMagnific -7.3943 0.190 -38.885 0.000 -7.767 -7.021
Model_VentoPetrol -7.4415 0.120 -62.139 0.000 -7.676 -7.207
Model_VentoSport -7.3029 0.158 -46.343 0.000 -7.612 -6.994
Model_VentoTSI 2.329e-12 8.15e-13 2.859 0.004 7.32e-13 3.93e-12
Model_VentureEX -4.7516 0.180 -26.436 0.000 -5.104 -4.399
Model_Verito1.5 -4.7821 0.118 -40.456 0.000 -5.014 -4.550
Model_Verna1.4 -3.3457 0.087 -38.623 0.000 -3.515 -3.176
Model_Verna1.6 -3.3032 0.062 -52.940 0.000 -3.426 -3.181
Model_VernaCRDi -3.4769 0.069 -50.095 0.000 -3.613 -3.341
Model_VernaSX -3.3096 0.083 -39.665 0.000 -3.473 -3.146
Model_VernaTransform -3.6204 0.081 -44.424 0.000 -3.780 -3.461
Model_VernaVTVT -3.2389 0.079 -40.793 0.000 -3.395 -3.083
Model_VernaXXi -3.6599 0.162 -22.549 0.000 -3.978 -3.342
Model_VernaXi -3.8194 0.163 -23.415 0.000 -4.139 -3.500
Model_VersaDX2 0.0337 0.228 0.148 0.883 -0.414 0.482
Model_VitaraBrezza 0.3896 0.161 2.423 0.015 0.074 0.705
Model_WR-VEdge -4.1562 0.168 -24.778 0.000 -4.485 -3.827
Model_WRVi-VTEC -3.9613 0.129 -30.816 0.000 -4.213 -3.709
Model_WagonR -0.1470 0.158 -0.930 0.352 -0.457 0.163
Model_X-TrailSLX -12.5613 0.233 -53.813 0.000 -13.019 -12.104
Model_X1M -10.0304 0.190 -52.674 0.000 -10.404 -9.657
Model_X1sDrive -10.2879 0.164 -62.667 0.000 -10.610 -9.966
Model_X1sDrive20d -10.3399 0.166 -62.214 0.000 -10.666 -10.014
Model_X1xDrive -9.9258 0.218 -45.568 0.000 -10.353 -9.499
Model_X3xDrive -9.8674 0.177 -55.888 0.000 -10.214 -9.521
Model_X3xDrive20d -10.0053 0.171 -58.548 0.000 -10.340 -9.670
Model_X3xDrive30d -9.9052 0.218 -45.362 0.000 -10.333 -9.477
Model_X52014-2019 -9.6152 0.183 -52.491 0.000 -9.974 -9.256
Model_X53.0d -9.8939 0.177 -55.899 0.000 -10.241 -9.547
Model_X5X5 -9.4532 0.191 -49.367 0.000 -9.829 -9.078
Model_X5xDrive -9.6457 0.168 -57.344 0.000 -9.975 -9.316
Model_X6xDrive -9.3552 0.182 -51.413 0.000 -9.712 -8.998
Model_X6xDrive30d -9.5430 0.189 -50.375 0.000 -9.914 -9.172
Model_XC60D4 -21.8700 0.362 -60.497 0.000 -22.579 -21.161
Model_XC60D5 -21.8938 0.356 -61.539 0.000 -22.591 -21.196
Model_XC902007-2015 -21.6101 0.381 -56.792 0.000 -22.356 -20.864
Model_XE2.0L -6.131e-13 4.43e-13 -1.384 0.166 -1.48e-12 2.56e-13
Model_XEPortfolio -4.061e-13 2.33e-13 -1.743 0.081 -8.63e-13 5.06e-14
Model_XF2.0 -19.7848 0.346 -57.132 0.000 -20.464 -19.106
Model_XF2.2 -19.8963 0.318 -62.619 0.000 -20.519 -19.273
Model_XF3.0 -20.1215 0.318 -63.354 0.000 -20.744 -19.499
Model_XFAero -19.5800 0.339 -57.700 0.000 -20.245 -18.915
Model_XFDiesel -20.0804 0.322 -62.355 0.000 -20.712 -19.449
Model_XJ2.0L -19.2596 0.348 -55.405 0.000 -19.941 -18.578
Model_XJ3.0L -19.4028 0.326 -59.570 0.000 -20.041 -18.764
Model_XJ5.0 -19.5345 0.344 -56.745 0.000 -20.210 -18.860
Model_XUV300W8 -3.9942 0.171 -23.300 0.000 -4.330 -3.658
Model_XUV500AT -3.9524 0.093 -42.556 0.000 -4.134 -3.770
Model_XUV500W10 -3.8637 0.087 -44.164 0.000 -4.035 -3.692
Model_XUV500W4 -4.1634 0.116 -36.038 0.000 -4.390 -3.937
Model_XUV500W6 -4.0692 0.090 -45.260 0.000 -4.245 -3.893
Model_XUV500W7 -4.1544 0.171 -24.323 0.000 -4.489 -3.820
Model_XUV500W8 -4.0323 0.077 -52.450 0.000 -4.183 -3.882
Model_XUV500W9 -4e-13 3.43e-13 -1.167 0.243 -1.07e-12 2.72e-13
Model_Xcent1.1 -3.6344 0.075 -48.770 0.000 -3.781 -3.488
Model_Xcent1.2 -3.6668 0.068 -53.637 0.000 -3.801 -3.533
Model_XenonXT -4.5936 0.122 -37.769 0.000 -4.832 -4.355
Model_XyloD2 -5.0120 0.134 -37.493 0.000 -5.274 -4.750
Model_XyloD4 -4.6961 0.104 -45.327 0.000 -4.899 -4.493
Model_XyloE2 -4.7978 0.169 -28.452 0.000 -5.128 -4.467
Model_XyloE4 -4.5368 0.131 -34.708 0.000 -4.793 -4.281
Model_XyloE8 -4.5919 0.118 -39.042 0.000 -4.823 -4.361
Model_XyloH4 -4.3561 0.172 -25.343 0.000 -4.693 -4.019
Model_YetiAmbition -5.5909 0.130 -43.096 0.000 -5.845 -5.337
Model_YetiElegance -5.4991 0.130 -42.151 0.000 -5.755 -5.243
Model_Z42009-2013 -9.5870 0.194 -49.443 0.000 -9.967 -9.207
Model_ZenEstilo -0.3639 0.166 -2.190 0.029 -0.690 -0.038
Model_ZenLX -0.3016 0.192 -1.574 0.116 -0.677 0.074
Model_ZenLXI 2.961e-17 1.73e-17 1.714 0.087 -4.26e-18 6.35e-17
Model_ZenLXi -0.2961 0.181 -1.639 0.101 -0.650 0.058
Model_ZenVX 0.0019 0.221 0.009 0.993 -0.431 0.435
Model_ZenVXI -0.2790 0.181 -1.541 0.123 -0.634 0.076
Model_ZenVXi -0.1740 0.221 -0.788 0.431 -0.607 0.259
Model_ZestQuadrajet -4.6607 0.113 -41.395 0.000 -4.881 -4.440
Model_ZestRevotron -4.5122 0.099 -45.371 0.000 -4.707 -4.317
Model_i10Asta -3.5636 0.090 -39.566 0.000 -3.740 -3.387
Model_i10Era -3.7607 0.065 -57.629 0.000 -3.889 -3.633
Model_i10Magna -3.6895 0.061 -60.302 0.000 -3.810 -3.570
Model_i10Magna(O) -3.6090 0.162 -22.231 0.000 -3.927 -3.291
Model_i10Sportz -3.7076 0.061 -60.603 0.000 -3.828 -3.588
Model_i201.2 -3.5059 0.064 -55.086 0.000 -3.631 -3.381
Model_i201.4 -3.5404 0.067 -52.771 0.000 -3.672 -3.409
Model_i202015-2017 -3.5293 0.080 -44.362 0.000 -3.685 -3.373
Model_i20Active -3.4474 0.078 -44.056 0.000 -3.601 -3.294
Model_i20Asta -3.3911 0.065 -51.836 0.000 -3.519 -3.263
Model_i20Diesel -3.4774 0.166 -20.945 0.000 -3.803 -3.152
Model_i20Era -3.6378 0.163 -22.263 0.000 -3.958 -3.317
Model_i20Magna -3.5601 0.066 -53.850 0.000 -3.690 -3.430
Model_i20Sportz -3.4857 0.064 -54.491 0.000 -3.611 -3.360
Model_redi-GOS -28.4620 0.466 -61.044 0.000 -29.376 -27.548
Model_redi-GOT -28.4673 0.450 -63.209 0.000 -29.350 -27.584
==============================================================================
Omnibus: 729.914 Durbin-Watson: 1.976
Prob(Omnibus): 0.000 Jarque-Bera (JB): 11194.465
Skew: -0.350 Prob(JB): 0.00
Kurtosis: 10.957 Cond. No. 1.68e+21
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.02e-32. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
get_model_score(olsmodel1)
R-square on training set : 0.9632326167142701 R-square on test set : -1.7677825405890375e+83 RMSE on training set : 2.1857914088403443 RMSE on test set : 4.4582786612795244e+42
[0.9632326167142701, -1.7677825405890375e+83, 2.1857914088403443, 4.4582786612795244e+42]
# Retrive Coeff values, p-values and store them in the dataframe
olsmod = pd.DataFrame(olsmodel1.params, columns = ['coef'])
olsmod['pval'] = olsmodel1.pvalues
# We are looking for overall significant variables
pval_filter = olsmod['pval']<= 0.05
imp_vars = olsmod[pval_filter].index.tolist()
# We are going to get overall varaibles (un-one-hot encoded varables) from categorical varaibles
sig_var = []
for col in imp_vars:
if '' in col:
first_part = col.split('_')[0]
for c in cars_data.columns:
if first_part in c and c not in sig_var :
sig_var.append(c)
start = '\033[1m'
end = '\033[95m'
print(start+ 'Most overall significant categorical variables of LINEAR REGRESSION are ' +end,':\n', sig_var)
Most overall significant categorical variables of LINEAR REGRESSION are : ['Year', 'Mileage', 'Power', 'kilometers_driven_log', 'Location', 'Fuel_Type', 'Transmission', 'Owner_Type', 'Brand', 'Model']
#import statsmodels.api as sm - done before
# Statsmodel api does not add a constant by default. We need to add it explicitly
x_train = sm.add_constant(X_train)
# Add constant to test data
x_test = sm.add_constant(X_test)
def build_ols_model(train):
# Create the model
olsmodel = sm.OLS(y_train["price_log"], train)
return olsmodel.fit()
# Fit linear model on new dataset
olsmodel2 = build_ols_model(X_train)
print(olsmodel2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: price_log R-squared: 0.973
Model: OLS Adj. R-squared: 0.969
Method: Least Squares F-statistic: 204.2
Date: Thu, 02 Feb 2023 Prob (F-statistic): 0.00
Time: 12:22:34 Log-Likelihood: 2207.0
No. Observations: 4211 AIC: -3134.
Df Residuals: 3571 BIC: 927.1
Df Model: 639
Covariance Type: nonrobust
=============================================================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------------------
Year 0.0997 0.002 65.021 0.000 0.097 0.103
Mileage -0.0039 0.002 -2.493 0.013 -0.007 -0.001
Engine 9.227e-06 2.9e-05 0.318 0.750 -4.76e-05 6.6e-05
Power 0.0014 0.000 4.087 0.000 0.001 0.002
Seats 0.0118 0.019 0.608 0.543 -0.026 0.050
kilometers_driven_log -0.0760 0.005 -14.577 0.000 -0.086 -0.066
Location_Bangalore 0.1733 0.017 10.163 0.000 0.140 0.207
Location_Chennai 0.0485 0.016 2.987 0.003 0.017 0.080
Location_Coimbatore 0.1419 0.015 9.177 0.000 0.112 0.172
Location_Delhi -0.0932 0.016 -5.950 0.000 -0.124 -0.062
Location_Hyderabad 0.1470 0.015 9.765 0.000 0.117 0.177
Location_Jaipur -0.0289 0.017 -1.745 0.081 -0.061 0.004
Location_Kochi -0.0098 0.015 -0.630 0.529 -0.040 0.021
Location_Kolkata -0.2266 0.016 -14.190 0.000 -0.258 -0.195
Location_Mumbai -0.0768 0.015 -5.106 0.000 -0.106 -0.047
Location_Pune -0.0337 0.016 -2.151 0.032 -0.064 -0.003
Fuel_Type_Diesel 0.0166 0.031 0.530 0.596 -0.045 0.078
Fuel_Type_LPG -0.0655 0.076 -0.859 0.390 -0.215 0.084
Fuel_Type_Petrol -0.0912 0.032 -2.852 0.004 -0.154 -0.028
Transmission_Manual -0.0960 0.010 -9.375 0.000 -0.116 -0.076
Owner_Type_Fourth & Above -0.0864 0.074 -1.172 0.241 -0.231 0.058
Owner_Type_Second -0.0514 0.008 -6.637 0.000 -0.067 -0.036
Owner_Type_Third -0.1223 0.021 -5.920 0.000 -0.163 -0.082
Brand_Audi -190.4469 3.009 -63.282 0.000 -196.347 -184.546
Brand_BMW -187.0584 2.958 -63.244 0.000 -192.857 -181.259
Brand_Bentley -97.9801 1.556 -62.987 0.000 -101.030 -94.930
Brand_Chevrolet -189.1596 2.963 -63.834 0.000 -194.969 -183.350
Brand_Datsun -170.6479 2.668 -63.953 0.000 -175.879 -165.416
Brand_Fiat -183.4456 2.871 -63.896 0.000 -189.075 -177.817
Brand_Force -98.8717 1.560 -63.395 0.000 -101.930 -95.814
Brand_Ford -192.2491 3.017 -63.715 0.000 -198.165 -186.333
Brand_Honda -194.2222 3.048 -63.726 0.000 -200.198 -188.247
Brand_Hyundai -194.8630 3.054 -63.807 0.000 -200.851 -188.875
Brand_Isuzu -99.0671 1.560 -63.498 0.000 -102.126 -96.008
Brand_Jaguar -176.9958 2.802 -63.174 0.000 -182.489 -171.503
Brand_Jeep -98.8187 1.559 -63.395 0.000 -101.875 -95.762
Brand_Lamborghini -97.7633 1.557 -62.777 0.000 -100.817 -94.710
Brand_Land Rover -147.6478 2.338 -63.164 0.000 -152.231 -143.065
Brand_Mahindra -193.8566 3.047 -63.621 0.000 -199.831 -187.882
Brand_Maruti -198.5188 3.096 -64.122 0.000 -204.589 -192.449
Brand_Mercedes-Benz -191.4421 3.026 -63.258 0.000 -197.376 -185.509
Brand_Mini Cooper -172.4000 2.723 -63.312 0.000 -177.739 -167.061
Brand_Mitsubishi -173.0238 2.722 -63.565 0.000 -178.361 -167.687
Brand_Nissan -185.1204 2.905 -63.729 0.000 -190.816 -179.425
Brand_Porsche -175.0380 2.768 -63.240 0.000 -180.465 -169.611
Brand_Renault -188.5606 2.957 -63.770 0.000 -194.358 -182.763
Brand_Skoda -192.2906 3.018 -63.721 0.000 -198.207 -186.374
Brand_Smart -99.1499 1.556 -63.725 0.000 -102.200 -96.099
Brand_Tata -194.0216 3.035 -63.925 0.000 -199.972 -188.071
Brand_Toyota -191.5153 3.012 -63.582 0.000 -197.421 -185.610
Brand_Volkswagen -190.8784 2.997 -63.682 0.000 -196.755 -185.002
Brand_Volvo -175.3870 2.768 -63.362 0.000 -180.814 -169.960
Model_1Series -10.4063 0.188 -55.323 0.000 -10.775 -10.037
Model_3Series -10.2546 0.155 -66.004 0.000 -10.559 -9.950
Model_5Series -9.9958 0.157 -63.744 0.000 -10.303 -9.688
Model_6Series -9.3836 0.174 -53.847 0.000 -9.725 -9.042
Model_7Series -9.5631 0.166 -57.505 0.000 -9.889 -9.237
Model_800AC -0.7719 0.177 -4.371 0.000 -1.118 -0.426
Model_800DX 1.018e-11 8.9e-13 11.441 0.000 8.44e-12 1.19e-11
Model_800Std -0.5967 0.193 -3.097 0.002 -0.974 -0.219
Model_A-StarAT -0.2263 0.192 -1.178 0.239 -0.603 0.150
Model_A-StarLxi -0.1284 0.181 -0.710 0.478 -0.483 0.226
Model_A-StarVxi -0.0980 0.172 -0.571 0.568 -0.435 0.239
Model_A335 -6.9104 0.132 -52.516 0.000 -7.168 -6.652
Model_A41.8 -6.8775 0.149 -46.248 0.000 -7.169 -6.586
Model_A42.0 -6.8241 0.107 -63.648 0.000 -7.034 -6.614
Model_A43.0 -7.0016 0.135 -51.879 0.000 -7.266 -6.737
Model_A43.2 8.799e-12 1.29e-12 6.820 0.000 6.27e-12 1.13e-11
Model_A430 -6.8209 0.186 -36.671 0.000 -7.186 -6.456
Model_A435 -6.8321 0.123 -55.571 0.000 -7.073 -6.591
Model_A4New -6.9809 0.136 -51.180 0.000 -7.248 -6.713
Model_A62.0 -6.2528 0.185 -33.880 0.000 -6.615 -5.891
Model_A62.7 -6.8744 0.129 -53.346 0.000 -7.127 -6.622
Model_A62.8 -6.7536 0.182 -37.179 0.000 -7.110 -6.397
Model_A62011-2015 -6.6467 0.111 -59.636 0.000 -6.865 -6.428
Model_A63.0 -6.9279 0.134 -51.887 0.000 -7.190 -6.666
Model_A635 -6.4954 0.127 -51.293 0.000 -6.744 -6.247
Model_A72011-2015 -6.1878 0.188 -32.959 0.000 -6.556 -5.820
Model_A8L -5.7975 0.184 -31.523 0.000 -6.158 -5.437
Model_AClass -5.9945 0.114 -52.575 0.000 -6.218 -5.771
Model_AccentCRDi -3.9628 0.120 -32.909 0.000 -4.199 -3.727
Model_AccentExecutive 3.258e-11 3.66e-12 8.890 0.000 2.54e-11 3.98e-11
Model_AccentGLE -3.9540 0.071 -55.455 0.000 -4.094 -3.814
Model_AccentGLS -3.8569 0.103 -37.375 0.000 -4.059 -3.655
Model_Accord2.4 -3.7927 0.078 -48.905 0.000 -3.945 -3.641
Model_Accord2001-2003 -3.6696 0.127 -28.955 0.000 -3.918 -3.421
Model_AccordV6 -4.1531 0.168 -24.789 0.000 -4.482 -3.825
Model_AccordVTi-L -4.0594 0.128 -31.711 0.000 -4.310 -3.808
Model_Alto800 -0.4585 0.160 -2.862 0.004 -0.773 -0.144
Model_AltoGreen -0.4692 0.223 -2.102 0.036 -0.907 -0.031
Model_AltoK10 -0.3591 0.160 -2.247 0.025 -0.673 -0.046
Model_AltoLX -0.6173 0.192 -3.215 0.001 -0.994 -0.241
Model_AltoLXI -3.96e-12 8.81e-13 -4.493 0.000 -5.69e-12 -2.23e-12
Model_AltoLXi -0.2642 0.160 -1.648 0.099 -0.579 0.050
Model_AltoStd -0.2983 0.191 -1.558 0.119 -0.674 0.077
Model_AltoVXi -5.404e-12 6.62e-13 -8.165 0.000 -6.7e-12 -4.11e-12
Model_AltoVxi 0.1386 0.221 0.626 0.531 -0.295 0.572
Model_AmazeE -4.2098 0.085 -49.309 0.000 -4.377 -4.042
Model_AmazeEX -4.3100 0.110 -39.351 0.000 -4.525 -4.095
Model_AmazeS -4.2362 0.071 -59.441 0.000 -4.376 -4.096
Model_AmazeSX -4.2757 0.101 -42.256 0.000 -4.474 -4.077
Model_AmazeV -4.2274 0.127 -33.191 0.000 -4.477 -3.978
Model_AmazeVX -4.1739 0.077 -54.159 0.000 -4.325 -4.023
Model_Ameo1.2 -7.6078 0.128 -59.497 0.000 -7.859 -7.357
Model_Ameo1.5 -7.5577 0.147 -51.337 0.000 -7.846 -7.269
Model_AspireAmbiente -6.2611 0.181 -34.533 0.000 -6.617 -5.906
Model_AspireTitanium -5.9793 0.130 -45.825 0.000 -6.235 -5.723
Model_Aveo1.4 -9.5678 0.170 -56.322 0.000 -9.901 -9.235
Model_Aveo1.6 -9.7178 0.209 -46.450 0.000 -10.128 -9.308
Model_AveoU-VA -9.7213 0.160 -60.940 0.000 -10.034 -9.408
Model_AvventuraMULTIJET -14.9094 0.259 -57.580 0.000 -15.417 -14.402
Model_BClass -6.0582 0.105 -57.530 0.000 -6.265 -5.852
Model_BR-Vi-DTEC 6.159e-13 8.54e-13 0.721 0.471 -1.06e-12 2.29e-12
Model_BR-Vi-VTEC -4.0353 0.134 -30.145 0.000 -4.298 -3.773
Model_BRVi-VTEC -3.9152 0.104 -37.533 0.000 -4.120 -3.711
Model_BalenoAlpha 0.2921 0.163 1.793 0.073 -0.027 0.612
Model_BalenoDelta 0.1067 0.166 0.644 0.520 -0.218 0.431
Model_BalenoLXI -0.4686 0.192 -2.438 0.015 -0.845 -0.092
Model_BalenoRS 0.2623 0.176 1.490 0.136 -0.083 0.607
Model_BalenoSigma 0.0793 0.182 0.436 0.663 -0.277 0.436
Model_BalenoVxi -0.3108 0.192 -1.616 0.106 -0.688 0.066
Model_BalenoZeta 0.1698 0.164 1.035 0.301 -0.152 0.492
Model_BeatDiesel -9.7274 0.154 -62.962 0.000 -10.030 -9.425
Model_BeatLS -9.6412 0.159 -60.686 0.000 -9.953 -9.330
Model_BeatLT -9.6850 0.155 -62.370 0.000 -9.989 -9.381
Model_BeatOption 2.326e-12 7.66e-13 3.038 0.002 8.25e-13 3.83e-12
Model_Beetle2.0 1.4e-12 8e-13 1.749 0.080 -1.69e-13 2.97e-12
Model_BoleroDI -4.3241 0.168 -25.667 0.000 -4.654 -3.994
Model_BoleroSLE -4.6119 0.170 -27.160 0.000 -4.945 -4.279
Model_BoleroSLX -4.3986 0.169 -25.971 0.000 -4.731 -4.067
Model_BoleroVLX -4.3062 0.170 -25.279 0.000 -4.640 -3.972
Model_BoleroZLX -4.3942 0.098 -44.822 0.000 -4.586 -4.202
Model_BoleromHAWK 3.808e-12 7.56e-13 5.040 0.000 2.33e-12 5.29e-12
Model_BoltQuadrajet -4.6461 0.136 -34.071 0.000 -4.913 -4.379
Model_BoltRevotron -4.8586 0.172 -28.265 0.000 -5.196 -4.522
Model_BoxsterS -1.907e-12 7.73e-13 -2.468 0.014 -3.42e-12 -3.92e-13
Model_Brio1.2 -4.3234 0.101 -42.859 0.000 -4.521 -4.126
Model_BrioE -4.3443 0.166 -26.130 0.000 -4.670 -4.018
Model_BrioEX -4.4702 0.167 -26.704 0.000 -4.798 -4.142
Model_BrioS -4.3564 0.071 -60.978 0.000 -4.496 -4.216
Model_BrioV -4.3053 0.090 -48.050 0.000 -4.481 -4.130
Model_BrioVX -4.3074 0.088 -48.723 0.000 -4.481 -4.134
Model_C-ClassProgressive -5.8321 0.148 -39.441 0.000 -6.122 -5.542
Model_CLA200 -5.8348 0.104 -55.887 0.000 -6.039 -5.630
Model_CLS-Class2006-2010 -5.2682 0.175 -30.183 0.000 -5.610 -4.926
Model_CR-V2.0 2.984e-12 8.38e-13 3.559 0.000 1.34e-12 4.63e-12
Model_CR-V2.0L -3.4664 0.095 -36.526 0.000 -3.652 -3.280
Model_CR-V2.4 -3.6665 0.090 -40.900 0.000 -3.842 -3.491
Model_CR-V2.4L -3.3975 0.101 -33.706 0.000 -3.595 -3.200
Model_CR-VAT -5.736e-13 8.67e-13 -0.662 0.508 -2.27e-12 1.13e-12
Model_CR-VPetrol -2.473e-12 6.82e-13 -3.626 0.000 -3.81e-12 -1.14e-12
Model_CR-VRVi -3.3289 0.126 -26.476 0.000 -3.575 -3.082
Model_CR-VSport -3.1921 0.168 -19.021 0.000 -3.521 -2.863
Model_Camry2.5 -5.5548 0.186 -29.843 0.000 -5.920 -5.190
Model_CamryA/T -8.829e-13 5.96e-13 -1.482 0.138 -2.05e-12 2.85e-13
Model_CamryHybrid -5.5055 0.138 -40.038 0.000 -5.775 -5.236
Model_CamryW2 -6.5387 0.182 -35.919 0.000 -6.896 -6.182
Model_CamryW4 -6.6148 0.182 -36.353 0.000 -6.972 -6.258
Model_CaptivaLT -2.426e-13 8.2e-13 -0.296 0.767 -1.85e-12 1.37e-12
Model_CaptivaLTZ -9.1088 0.212 -42.900 0.000 -9.525 -8.693
Model_Captur1.5 -9.4996 0.189 -50.314 0.000 -9.870 -9.129
Model_Cayenne2009-2014 -21.8549 0.362 -60.438 0.000 -22.564 -21.146
Model_CayenneBase -25.2074 0.380 -66.343 0.000 -25.952 -24.462
Model_CayenneDiesel -21.4012 0.377 -56.837 0.000 -22.139 -20.663
Model_CayenneS -21.6049 0.380 -56.876 0.000 -22.350 -20.860
Model_CayenneTurbo -21.6669 0.378 -57.262 0.000 -22.409 -20.925
Model_Cayman2009-2012 -21.0656 0.376 -56.016 0.000 -21.803 -20.328
Model_CediaSports 1.898e-12 8.27e-13 2.296 0.022 2.77e-13 3.52e-12
Model_CelerioCNG -0.0125 0.224 -0.056 0.955 -0.451 0.426
Model_CelerioLDi -0.4440 0.222 -2.001 0.046 -0.879 -0.009
Model_CelerioLXI -0.1513 0.182 -0.831 0.406 -0.508 0.205
Model_CelerioVXI -0.1642 0.161 -1.020 0.308 -0.480 0.151
Model_CelerioZDi -0.1584 0.222 -0.713 0.476 -0.594 0.277
Model_CelerioZXI -0.0793 0.164 -0.484 0.629 -0.401 0.242
Model_Ciaz1.3 0.3287 0.173 1.901 0.057 -0.010 0.668
Model_Ciaz1.4 0.4380 0.177 2.481 0.013 0.092 0.784
Model_CiazAT 0.3220 0.193 1.670 0.095 -0.056 0.700
Model_CiazAlpha 3.153e-12 9.59e-13 3.287 0.001 1.27e-12 5.03e-12
Model_CiazRS 0.5411 0.222 2.436 0.015 0.106 0.977
Model_CiazVDI 0.2534 0.182 1.393 0.164 -0.103 0.610
Model_CiazVDi 0.3388 0.169 2.007 0.045 0.008 0.670
Model_CiazVXi 0.3646 0.176 2.066 0.039 0.019 0.711
Model_CiazZDi 0.4153 0.164 2.534 0.011 0.094 0.737
Model_CiazZXi 0.4435 0.170 2.602 0.009 0.109 0.778
Model_CiazZeta 0.4020 0.193 2.083 0.037 0.024 0.780
Model_City1.3 -4.1017 0.095 -43.123 0.000 -4.288 -3.915
Model_City1.5 -4.0157 0.065 -61.754 0.000 -4.143 -3.888
Model_CityCorporate -3.9912 0.166 -24.090 0.000 -4.316 -3.666
Model_CityV -3.9466 0.076 -52.148 0.000 -4.095 -3.798
Model_CityZX -4.1572 0.073 -56.656 0.000 -4.301 -4.013
Model_Cityi -3.8646 0.069 -55.892 0.000 -4.000 -3.729
Model_Cityi-DTEC -3.6114 0.128 -28.140 0.000 -3.863 -3.360
Model_Cityi-VTEC -3.8434 0.078 -49.402 0.000 -3.996 -3.691
Model_Civic2006-2010 -4.0615 0.073 -55.839 0.000 -4.204 -3.919
Model_Civic2010-2013 -4.1479 0.086 -47.979 0.000 -4.317 -3.978
Model_Classic1.4 2.994e-12 6.37e-13 4.702 0.000 1.75e-12 4.24e-12
Model_ClassicNova -198.4473 3.106 -63.893 0.000 -204.537 -192.358
Model_ClubmanCooper -24.5038 0.417 -58.772 0.000 -25.321 -23.686
Model_Compass1.4 -1.347e-12 7.93e-13 -1.698 0.090 -2.9e-12 2.09e-13
Model_Compass2.0 -98.8187 1.559 -63.395 0.000 -101.875 -95.762
Model_ContinentalFlying -97.9801 1.556 -62.987 0.000 -101.030 -94.930
Model_Cooper3 -24.6235 0.395 -62.333 0.000 -25.398 -23.849
Model_Cooper5 -24.7427 0.397 -62.305 0.000 -25.521 -23.964
Model_CooperConvertible -24.4751 0.396 -61.837 0.000 -25.251 -23.699
Model_CooperCountryman -24.7224 0.401 -61.583 0.000 -25.509 -23.935
Model_CooperS -24.3721 0.400 -60.933 0.000 -25.156 -23.588
Model_Corolla1.8 -6.1820 0.184 -33.655 0.000 -6.542 -5.822
Model_CorollaAltis -6.3750 0.106 -59.873 0.000 -6.584 -6.166
Model_CorollaDX -6.7961 0.177 -38.370 0.000 -7.143 -6.449
Model_CorollaExecutive -6.6601 0.180 -37.061 0.000 -7.012 -6.308
Model_CorollaH2 -7.0250 0.180 -39.073 0.000 -7.378 -6.673
Model_CorollaH4 -6.7367 0.122 -55.083 0.000 -6.976 -6.497
Model_CorollaH5 -6.8531 0.145 -47.104 0.000 -7.138 -6.568
Model_CountrymanCooper -24.9604 0.417 -59.853 0.000 -25.778 -24.143
Model_Creta1.4 -3.0382 0.088 -34.400 0.000 -3.211 -2.865
Model_Creta1.6 -3.0055 0.066 -45.418 0.000 -3.135 -2.876
Model_CrossPolo1.5 -7.5774 0.158 -47.815 0.000 -7.888 -7.267
Model_CruzeLTZ -9.0739 0.161 -56.266 0.000 -9.390 -8.758
Model_D-MAXV-Cross -99.0671 1.560 -63.498 0.000 -102.126 -96.008
Model_Duster110PS -9.5441 0.161 -59.357 0.000 -9.859 -9.229
Model_Duster85PS -9.6185 0.161 -59.782 0.000 -9.934 -9.303
Model_DusterAdventure -9.6232 0.217 -44.358 0.000 -10.049 -9.198
Model_DusterPetrol -9.8249 0.219 -44.916 0.000 -10.254 -9.396
Model_DusterRXZ 7.797e-13 7.49e-13 1.041 0.298 -6.89e-13 2.25e-12
Model_DzireAMT 0.1008 0.177 0.570 0.569 -0.246 0.448
Model_DzireLDI 0.0997 0.223 0.448 0.654 -0.337 0.536
Model_DzireNew 0.3045 0.222 1.369 0.171 -0.132 0.741
Model_DzireVDI 0.2813 0.176 1.595 0.111 -0.065 0.627
Model_DzireVXI 0.2521 0.181 1.390 0.165 -0.104 0.608
Model_DzireZDI 0.2884 0.193 1.494 0.135 -0.090 0.667
Model_E-Class200 3.363e-12 1.14e-12 2.940 0.003 1.12e-12 5.61e-12
Model_E-Class2009-2013 -5.6691 0.094 -60.194 0.000 -5.854 -5.484
Model_E-Class2015-2017 -5.5356 0.103 -53.498 0.000 -5.738 -5.333
Model_E-Class220 9.172e-13 8.03e-13 1.143 0.253 -6.57e-13 2.49e-12
Model_E-Class230 -5.9963 0.124 -48.173 0.000 -6.240 -5.752
Model_E-Class250 -5.7455 0.167 -34.356 0.000 -6.073 -5.418
Model_E-Class280 -5.9938 0.109 -55.188 0.000 -6.207 -5.781
Model_E-ClassE -5.2590 0.141 -37.223 0.000 -5.536 -4.982
Model_E-ClassE250 -5.6625 0.111 -51.194 0.000 -5.879 -5.446
Model_E-ClassE270 -5.8159 0.171 -33.989 0.000 -6.151 -5.480
Model_E-ClassE350 -5.4948 0.177 -31.026 0.000 -5.842 -5.148
Model_E-ClassE400 -5.0602 0.178 -28.381 0.000 -5.410 -4.711
Model_E-ClassFacelift -5.5342 0.179 -30.920 0.000 -5.885 -5.183
Model_EON1.0 -1.499e-12 8.93e-13 -1.680 0.093 -3.25e-12 2.51e-13
Model_EOND -3.9973 0.075 -53.633 0.000 -4.143 -3.851
Model_EONEra -4.0156 0.073 -55.176 0.000 -4.158 -3.873
Model_EONLPG -4.633e-13 7.84e-13 -0.591 0.555 -2e-12 1.07e-12
Model_EONMagna -4.0661 0.079 -51.394 0.000 -4.221 -3.911
Model_EONSportz -4.0561 0.108 -37.581 0.000 -4.268 -3.844
Model_EcoSport1.0 -5.8898 0.133 -44.183 0.000 -6.151 -5.628
Model_EcoSport1.5 -5.9411 0.103 -57.553 0.000 -6.143 -5.739
Model_Ecosport1.0 -5.8605 0.180 -32.604 0.000 -6.213 -5.508
Model_Ecosport1.5 -5.8902 0.101 -58.596 0.000 -6.087 -5.693
Model_EcosportSignature -5.9285 0.147 -40.462 0.000 -6.216 -5.641
Model_Eeco5 -0.3819 0.178 -2.144 0.032 -0.731 -0.033
Model_Eeco7 -0.3282 0.172 -1.909 0.056 -0.665 0.009
Model_EecoCNG 1.696e-12 8.65e-13 1.960 0.050 -1.23e-16 3.39e-12
Model_EecoSmiles -7.436e-14 6.05e-13 -0.123 0.902 -1.26e-12 1.11e-12
Model_Elantra1.6 -3.0195 0.124 -24.297 0.000 -3.263 -2.776
Model_Elantra2.0 -2.7589 0.126 -21.923 0.000 -3.006 -2.512
Model_ElantraCRDi -3.0443 0.076 -40.255 0.000 -3.193 -2.896
Model_ElantraSX -2.9220 0.165 -17.670 0.000 -3.246 -2.598
Model_Elitei20 -3.4032 0.078 -43.355 0.000 -3.557 -3.249
Model_Endeavour2.2 -5.0802 0.134 -37.931 0.000 -5.343 -4.818
Model_Endeavour2.5L -5.5773 0.137 -40.693 0.000 -5.846 -5.309
Model_Endeavour3.0L -5.6725 0.128 -44.488 0.000 -5.922 -5.422
Model_Endeavour3.2 -5.0355 0.123 -40.974 0.000 -5.276 -4.794
Model_Endeavour4x2 -5.5870 0.154 -36.214 0.000 -5.890 -5.285
Model_EndeavourHurricane -5.7893 0.153 -37.919 0.000 -6.089 -5.490
Model_EndeavourTitanium -7.985e-14 4.78e-13 -0.167 0.867 -1.02e-12 8.58e-13
Model_EndeavourXLT -5.4897 0.149 -36.882 0.000 -5.782 -5.198
Model_Enjoy1.3 -9.3697 0.192 -48.893 0.000 -9.745 -8.994
Model_Enjoy1.4 -9.3457 0.218 -42.903 0.000 -9.773 -8.919
Model_EnjoyPetrol -9.4752 0.214 -44.216 0.000 -9.895 -9.055
Model_EnjoyTCDi -9.4791 0.180 -52.708 0.000 -9.832 -9.127
Model_ErtigaLXI 0.4891 0.225 2.172 0.030 0.048 0.931
Model_ErtigaPaseo 0.2952 0.226 1.307 0.191 -0.148 0.738
Model_ErtigaSHVS 0.3766 0.173 2.174 0.030 0.037 0.716
Model_ErtigaVDI 0.4251 0.167 2.546 0.011 0.098 0.753
Model_ErtigaVXI 0.4107 0.172 2.390 0.017 0.074 0.747
Model_ErtigaZDI 0.4537 0.167 2.715 0.007 0.126 0.781
Model_ErtigaZXI 0.4604 0.186 2.479 0.013 0.096 0.825
Model_EsteemLX -0.4936 0.222 -2.227 0.026 -0.928 -0.059
Model_EsteemVxi -0.4946 0.175 -2.828 0.005 -0.838 -0.152
Model_EstiloLXI -0.1249 0.181 -0.691 0.489 -0.479 0.229
Model_Etios1.4 -1.591e-12 7.08e-13 -2.246 0.025 -2.98e-12 -2.02e-13
Model_EtiosCross -6.8298 0.130 -52.610 0.000 -7.084 -6.575
Model_EtiosG -6.9041 0.124 -55.793 0.000 -7.147 -6.662
Model_EtiosGD -6.8827 0.124 -55.525 0.000 -7.126 -6.640
Model_EtiosLiva -7.0123 0.111 -63.387 0.000 -7.229 -6.795
Model_EtiosPetrol -6.8233 0.185 -36.954 0.000 -7.185 -6.461
Model_EtiosV -6.8643 0.184 -37.367 0.000 -7.225 -6.504
Model_EtiosVD -6.5880 0.135 -48.863 0.000 -6.852 -6.324
Model_EtiosVX -6.8650 0.149 -45.942 0.000 -7.158 -6.572
Model_EtiosVXD -2.356e-14 9.31e-13 -0.025 0.980 -1.85e-12 1.8e-12
Model_Evalia2013 -13.4928 0.261 -51.715 0.000 -14.004 -12.981
Model_FType -19.3359 0.349 -55.456 0.000 -20.020 -18.652
Model_Fabia1.2 -6.4629 0.115 -56.273 0.000 -6.688 -6.238
Model_Fabia1.2L -6.3157 0.177 -35.664 0.000 -6.663 -5.969
Model_Fabia1.4 -6.0552 0.177 -34.241 0.000 -6.402 -5.708
Model_Fabia1.6 -6.5606 0.179 -36.584 0.000 -6.912 -6.209
Model_Fiesta1.4 -6.3168 0.101 -62.706 0.000 -6.514 -6.119
Model_Fiesta1.5 -6.0775 0.180 -33.728 0.000 -6.431 -5.724
Model_Fiesta1.6 -6.4219 0.119 -53.896 0.000 -6.655 -6.188
Model_FiestaClassic -6.4737 0.118 -54.844 0.000 -6.705 -6.242
Model_FiestaDiesel -5.7724 0.178 -32.492 0.000 -6.121 -5.424
Model_FiestaEXi -6.3823 0.143 -44.766 0.000 -6.662 -6.103
Model_FiestaTitanium 3.366e-13 7.26e-13 0.464 0.643 -1.09e-12 1.76e-12
Model_Figo1.2P 3.353e-13 6e-13 0.558 0.577 -8.42e-13 1.51e-12
Model_Figo1.5D -6.0593 0.141 -42.902 0.000 -6.336 -5.782
Model_Figo2015-2019 -6.3651 0.113 -56.397 0.000 -6.586 -6.144
Model_FigoAspire -6.2831 0.113 -55.662 0.000 -6.504 -6.062
Model_FigoDiesel -6.4529 0.100 -64.730 0.000 -6.648 -6.257
Model_FigoPetrol -6.3359 0.105 -60.412 0.000 -6.541 -6.130
Model_FigoTitanium -6.6700 0.181 -36.926 0.000 -7.024 -6.316
Model_Fluence1.5 -9.9159 0.215 -46.065 0.000 -10.338 -9.494
Model_Fluence2.0 -9.6142 0.215 -44.796 0.000 -10.035 -9.193
Model_FluenceDiesel -9.7868 0.187 -52.206 0.000 -10.154 -9.419
Model_Fortuner2.8 -5.7408 0.124 -46.118 0.000 -5.985 -5.497
Model_Fortuner3.0 -5.8331 0.116 -50.390 0.000 -6.060 -5.606
Model_Fortuner4x2 -5.7433 0.116 -49.371 0.000 -5.971 -5.515
Model_Fortuner4x4 -5.7713 0.127 -45.605 0.000 -6.019 -5.523
Model_FortunerTRD -5.8636 0.186 -31.511 0.000 -6.228 -5.499
Model_FortwoCDI -99.1499 1.556 -63.725 0.000 -102.200 -96.099
Model_FreestyleTitanium -5.6613 0.140 -40.353 0.000 -5.936 -5.386
Model_FusionPlus -6.0965 0.177 -34.478 0.000 -6.443 -5.750
Model_GL-Class2007 -5.1038 0.118 -43.332 0.000 -5.335 -4.873
Model_GL-Class350 -5.1340 0.149 -34.357 0.000 -5.427 -4.841
Model_GLAClass -5.7046 0.104 -54.912 0.000 -5.908 -5.501
Model_GLC220 -5.4271 0.141 -38.385 0.000 -5.704 -5.150
Model_GLC220d -5.3741 0.141 -38.093 0.000 -5.651 -5.098
Model_GLC43 -5.3052 0.182 -29.170 0.000 -5.662 -4.949
Model_GLE250d -5.1583 0.119 -43.309 0.000 -5.392 -4.925
Model_GLE350d -5.2360 0.113 -46.495 0.000 -5.457 -5.015
Model_GLS350d -5.0848 0.151 -33.597 0.000 -5.382 -4.788
Model_GONXT -28.3479 0.464 -61.029 0.000 -29.259 -27.437
Model_GOPlus -28.2127 0.461 -61.159 0.000 -29.117 -27.308
Model_GOT -28.5319 0.466 -61.167 0.000 -29.446 -27.617
Model_GallardoCoupe -97.7633 1.557 -62.777 0.000 -100.817 -94.710
Model_Getz1.3 4.582e-13 7.91e-13 0.579 0.562 -1.09e-12 2.01e-12
Model_Getz1.5 -3.9572 0.163 -24.223 0.000 -4.278 -3.637
Model_GetzGLE -3.7616 0.102 -36.991 0.000 -3.961 -3.562
Model_GetzGLS -4.2214 0.092 -45.985 0.000 -4.401 -4.041
Model_GetzGVS -3.3482 0.162 -20.704 0.000 -3.665 -3.031
Model_GrandVitara 0.5830 0.197 2.953 0.003 0.196 0.970
Model_GrandePunto -15.2656 0.257 -59.464 0.000 -15.769 -14.762
Model_Grandi10 -3.6844 0.063 -58.926 0.000 -3.807 -3.562
Model_HexaXT -3.7729 0.143 -26.309 0.000 -4.054 -3.492
Model_HexaXTA -3.7857 0.178 -21.261 0.000 -4.135 -3.437
Model_Ignis1.2 -0.2619 0.193 -1.358 0.175 -0.640 0.116
Model_Ignis1.3 0.0508 0.222 0.228 0.819 -0.385 0.487
Model_Ikon1.3 -6.4491 0.110 -58.648 0.000 -6.665 -6.234
Model_Ikon1.4 -7.2269 0.179 -40.318 0.000 -7.578 -6.875
Model_Ikon1.6 -6.6108 0.173 -38.118 0.000 -6.951 -6.271
Model_IndicaDLS -5.3396 0.116 -46.008 0.000 -5.567 -5.112
Model_IndicaGLS -5.0438 0.169 -29.839 0.000 -5.375 -4.712
Model_IndicaLEI -4.9374 0.168 -29.340 0.000 -5.267 -4.607
Model_IndicaV2 -5.2222 0.092 -57.000 0.000 -5.402 -5.043
Model_IndicaVista -4.9346 0.092 -53.536 0.000 -5.115 -4.754
Model_IndigoCS -4.9044 0.097 -50.510 0.000 -5.095 -4.714
Model_IndigoGLE -5.1794 0.133 -38.836 0.000 -5.441 -4.918
Model_IndigoLS -4.9978 0.106 -47.154 0.000 -5.206 -4.790
Model_IndigoLX -5.0369 0.109 -46.385 0.000 -5.250 -4.824
Model_IndigoXL -2.02e-12 9.41e-13 -2.147 0.032 -3.86e-12 -1.75e-13
Model_IndigoeCS -4.9951 0.105 -47.763 0.000 -5.200 -4.790
Model_Innova2.0 -6.0566 0.134 -45.055 0.000 -6.320 -5.793
Model_Innova2.5 -6.1405 0.114 -53.762 0.000 -6.364 -5.917
Model_InnovaCrysta -6.1272 0.118 -51.854 0.000 -6.359 -5.896
Model_Jazz1.2 -4.1526 0.077 -54.278 0.000 -4.303 -4.003
Model_Jazz1.5 -4.0671 0.085 -48.050 0.000 -4.233 -3.901
Model_JazzActive -4.3858 0.165 -26.652 0.000 -4.708 -4.063
Model_JazzExclusive -4.1900 0.167 -25.148 0.000 -4.517 -3.863
Model_JazzMode -4.3447 0.165 -26.382 0.000 -4.668 -4.022
Model_JazzS -4.2829 0.124 -34.425 0.000 -4.527 -4.039
Model_JazzSelect -4.2641 0.125 -34.169 0.000 -4.509 -4.019
Model_JazzV -4.0543 0.096 -42.359 0.000 -4.242 -3.867
Model_JazzVX -4.1028 0.102 -40.267 0.000 -4.303 -3.903
Model_JeepMM -4.2671 0.128 -33.320 0.000 -4.518 -4.016
Model_Jetta2007-2011 -7.1305 0.128 -55.491 0.000 -7.382 -6.879
Model_Jetta2012-2014 -6.9819 0.135 -51.810 0.000 -7.246 -6.718
Model_Jetta2013-2015 -6.9901 0.130 -53.585 0.000 -7.246 -6.734
Model_KUV100 -4.8161 0.093 -51.601 0.000 -4.999 -4.633
Model_KWID1.0 -10.3027 0.172 -59.733 0.000 -10.641 -9.964
Model_KWIDAMT -10.5141 0.217 -48.431 0.000 -10.940 -10.088
Model_KWIDClimber -10.3249 0.175 -58.880 0.000 -10.669 -9.981
Model_KWIDRXL -10.6211 0.188 -56.634 0.000 -10.989 -10.253
Model_KWIDRXT -10.3963 0.161 -64.647 0.000 -10.712 -10.081
Model_Koleos2.0 -9.2702 0.181 -51.192 0.000 -9.625 -8.915
Model_Lancer1.5 -25.4607 0.404 -63.075 0.000 -26.252 -24.669
Model_LancerGLXD -24.6319 0.407 -60.487 0.000 -25.430 -23.833
Model_Laura1.8 -5.9655 0.129 -46.238 0.000 -6.218 -5.713
Model_Laura1.9 -5.7571 0.120 -48.087 0.000 -5.992 -5.522
Model_LauraAmbiente -5.7958 0.115 -50.222 0.000 -6.022 -5.570
Model_LauraAmbition -5.7674 0.128 -44.972 0.000 -6.019 -5.516
Model_LauraClassic -6.0986 0.178 -34.311 0.000 -6.447 -5.750
Model_LauraElegance -5.8082 0.127 -45.618 0.000 -6.058 -5.559
Model_LauraL -6.0911 0.141 -43.049 0.000 -6.369 -5.814
Model_LauraRS -5.6934 0.180 -31.668 0.000 -6.046 -5.341
Model_Linea1.3 -14.9663 0.284 -52.623 0.000 -15.524 -14.409
Model_LineaClassic -15.3956 0.284 -54.134 0.000 -15.953 -14.838
Model_LineaEmotion -15.0820 0.257 -58.573 0.000 -15.587 -14.577
Model_LineaT -15.4314 0.282 -54.735 0.000 -15.984 -14.879
Model_LineaT-Jet -14.9673 0.285 -52.565 0.000 -15.526 -14.409
Model_Lodgy110PS -9.6851 0.201 -48.158 0.000 -10.079 -9.291
Model_LoganDiesel -4.9742 0.173 -28.832 0.000 -5.312 -4.636
Model_LoganPetrol -5.1021 0.170 -29.953 0.000 -5.436 -4.768
Model_M-ClassML -5.4614 0.098 -55.736 0.000 -5.654 -5.269
Model_MUX4WD -3.281e-12 8.43e-13 -3.889 0.000 -4.93e-12 -1.63e-12
Model_ManzaAqua -4.9587 0.116 -42.575 0.000 -5.187 -4.730
Model_ManzaAura -4.9780 0.108 -46.159 0.000 -5.189 -4.767
Model_ManzaClub -4.8096 0.171 -28.121 0.000 -5.145 -4.474
Model_ManzaELAN -4.5108 0.133 -33.832 0.000 -4.772 -4.249
Model_MicraActive -13.5738 0.223 -60.756 0.000 -14.012 -13.136
Model_MicraDiesel -13.4818 0.212 -63.576 0.000 -13.898 -13.066
Model_MicraXE -13.3985 0.252 -53.151 0.000 -13.893 -12.904
Model_MicraXL -13.6417 0.223 -61.171 0.000 -14.079 -13.204
Model_MicraXV -13.3898 0.216 -62.032 0.000 -13.813 -12.967
Model_MobilioE -4.1256 0.171 -24.182 0.000 -4.460 -3.791
Model_MobilioRS -4.0275 0.132 -30.420 0.000 -4.287 -3.768
Model_MobilioS -4.1556 0.103 -40.275 0.000 -4.358 -3.953
Model_MobilioV -3.9936 0.117 -34.095 0.000 -4.223 -3.764
Model_Montero3.2 -24.4339 0.416 -58.706 0.000 -25.250 -23.618
Model_MustangV8 -4.6117 0.196 -23.492 0.000 -4.997 -4.227
Model_NanoCX -5.5041 0.172 -32.001 0.000 -5.841 -5.167
Model_NanoCx -5.7012 0.133 -42.862 0.000 -5.962 -5.440
Model_NanoLX -5.5080 0.132 -41.629 0.000 -5.767 -5.249
Model_NanoLx -5.6141 0.117 -48.139 0.000 -5.843 -5.385
Model_NanoSTD -6.0777 0.173 -35.202 0.000 -6.416 -5.739
Model_NanoTwist -5.4198 0.102 -53.096 0.000 -5.620 -5.220
Model_NanoXT -5.3621 0.134 -39.927 0.000 -5.625 -5.099
Model_NanoXTA -5.1968 0.103 -50.364 0.000 -5.399 -4.994
Model_NewC-Class -5.7923 0.091 -63.940 0.000 -5.970 -5.615
Model_NewSafari -4.4754 0.105 -42.567 0.000 -4.682 -4.269
Model_Nexon1.2 2.531e-12 9.72e-13 2.603 0.009 6.25e-13 4.44e-12
Model_Nexon1.5 -4.1240 0.173 -23.884 0.000 -4.463 -3.785
Model_NuvoSportN6 -4.7146 0.171 -27.545 0.000 -5.050 -4.379
Model_NuvoSportN8 -2.615e-12 8.19e-13 -3.195 0.001 -4.22e-12 -1.01e-12
Model_Octavia1.9 -6.2216 0.177 -35.204 0.000 -6.568 -5.875
Model_Octavia2.0 -5.2702 0.145 -36.472 0.000 -5.554 -4.987
Model_OctaviaAmbiente -6.0928 0.125 -48.593 0.000 -6.339 -5.847
Model_OctaviaAmbition -5.3890 0.124 -43.577 0.000 -5.631 -5.147
Model_OctaviaClassic -6.0666 0.138 -44.022 0.000 -6.337 -5.796
Model_OctaviaElegance -5.3664 0.112 -48.092 0.000 -5.585 -5.148
Model_OctaviaL -6.0409 0.176 -34.270 0.000 -6.387 -5.695
Model_OctaviaRS -6.2112 0.176 -35.205 0.000 -6.557 -5.865
Model_OctaviaRider -6.1500 0.140 -43.943 0.000 -6.424 -5.876
Model_OctaviaStyle -2.409e-12 5.73e-13 -4.206 0.000 -3.53e-12 -1.29e-12
Model_Omni5 -0.4261 0.193 -2.212 0.027 -0.804 -0.048
Model_Omni8 -0.5503 0.173 -3.187 0.001 -0.889 -0.212
Model_OmniE -0.5724 0.186 -3.078 0.002 -0.937 -0.208
Model_OmniMPI -0.5397 0.182 -2.965 0.003 -0.897 -0.183
Model_OneLX -98.8717 1.560 -63.395 0.000 -101.930 -95.814
Model_Optra1.6 -9.3441 0.181 -51.752 0.000 -9.698 -8.990
Model_OptraMagnum -9.4623 0.158 -59.936 0.000 -9.772 -9.153
Model_Outlander2.4 -24.7546 0.402 -61.557 0.000 -25.543 -23.966
Model_Pajero2.8 -24.5173 0.397 -61.800 0.000 -25.295 -23.739
Model_Pajero4X4 -24.7015 0.422 -58.476 0.000 -25.530 -23.873
Model_PajeroSport -24.5237 0.404 -60.764 0.000 -25.315 -23.732
Model_Panamera2010 -21.0899 0.378 -55.822 0.000 -21.831 -20.349
Model_PanameraDiesel -21.1472 0.354 -59.682 0.000 -21.842 -20.453
Model_Passat1.8 -7.2679 0.191 -38.087 0.000 -7.642 -6.894
Model_Passat2.0 1.777e-12 9.31e-13 1.910 0.056 -4.75e-14 3.6e-12
Model_PassatDiesel -6.9680 0.137 -50.840 0.000 -7.237 -6.699
Model_PassatHighline -6.9587 0.190 -36.709 0.000 -7.330 -6.587
Model_Petra1.2 -15.5801 0.275 -56.573 0.000 -16.120 -15.040
Model_PlatinumEtios -1.697e-12 7.22e-13 -2.352 0.019 -3.11e-12 -2.82e-13
Model_Polo1.0 -7.4980 0.193 -38.864 0.000 -7.876 -7.120
Model_Polo1.2 -7.5363 0.121 -62.142 0.000 -7.774 -7.299
Model_Polo1.5 -7.5771 0.124 -61.227 0.000 -7.820 -7.334
Model_PoloDiesel -7.5681 0.120 -63.329 0.000 -7.802 -7.334
Model_PoloGT -7.4551 0.131 -56.957 0.000 -7.712 -7.199
Model_PoloGTI -7.6520 0.160 -47.958 0.000 -7.965 -7.339
Model_PoloIPL -8.871e-13 6.45e-13 -1.375 0.169 -2.15e-12 3.78e-13
Model_PoloPetrol -7.5474 0.119 -63.644 0.000 -7.780 -7.315
Model_PulsePetrol -9.9911 0.218 -45.832 0.000 -10.419 -9.564
Model_PulseRxL -10.0897 0.179 -56.406 0.000 -10.440 -9.739
Model_Punto1.2 -15.4665 0.285 -54.319 0.000 -16.025 -14.908
Model_Punto1.3 -15.2772 0.279 -54.698 0.000 -15.825 -14.730
Model_Punto1.4 -15.4527 0.280 -55.182 0.000 -16.002 -14.904
Model_PuntoEVO 6.954e-13 9.99e-13 0.696 0.486 -1.26e-12 2.65e-12
Model_Q32.0 -6.7812 0.126 -53.993 0.000 -7.027 -6.535
Model_Q32012-2015 -6.7844 0.123 -55.067 0.000 -7.026 -6.543
Model_Q330 -4.369e-12 1.06e-12 -4.103 0.000 -6.46e-12 -2.28e-12
Model_Q335 -6.7832 0.127 -53.427 0.000 -7.032 -6.534
Model_Q52.0 -6.4911 0.118 -55.049 0.000 -6.722 -6.260
Model_Q52008-2012 -6.5431 0.123 -53.283 0.000 -6.784 -6.302
Model_Q53.0 -6.5681 0.152 -43.231 0.000 -6.866 -6.270
Model_Q530 -6.3940 0.124 -51.564 0.000 -6.637 -6.151
Model_Q73.0 -6.4069 0.122 -52.635 0.000 -6.646 -6.168
Model_Q735 -6.3184 0.133 -47.499 0.000 -6.579 -6.058
Model_Q74.2 -6.4492 0.140 -46.133 0.000 -6.723 -6.175
Model_Q745 -6.0710 0.141 -43.205 0.000 -6.346 -5.795
Model_QualisFS -6.1899 0.198 -31.232 0.000 -6.578 -5.801
Model_QualisFleet -6.2484 0.202 -30.868 0.000 -6.645 -5.852
Model_QualisRS -6.1888 0.202 -30.568 0.000 -6.586 -5.792
Model_QuantoC2 -4.9564 0.170 -29.237 0.000 -5.289 -4.624
Model_QuantoC4 -4.7584 0.168 -28.282 0.000 -5.088 -4.429
Model_QuantoC6 -4.7912 0.168 -28.493 0.000 -5.121 -4.461
Model_QuantoC8 -4.7291 0.129 -36.567 0.000 -4.983 -4.476
Model_R-ClassR350 -5.5495 0.133 -41.697 0.000 -5.810 -5.289
Model_RS5Coupe -6.2802 0.164 -38.253 0.000 -6.602 -5.958
Model_Rapid1.5 -5.9734 0.106 -56.206 0.000 -6.182 -5.765
Model_Rapid1.6 -6.0211 0.103 -58.223 0.000 -6.224 -5.818
Model_Rapid2013-2016 9.712e-13 8.39e-13 1.158 0.247 -6.74e-13 2.62e-12
Model_RapidLeisure 1.033e-12 9.85e-13 1.048 0.295 -8.99e-13 2.96e-12
Model_RapidUltima -6.3048 0.180 -35.032 0.000 -6.658 -5.952
Model_RediGO -28.6261 0.466 -61.479 0.000 -29.539 -27.713
Model_RenaultLogan -4.6565 0.167 -27.842 0.000 -4.984 -4.329
Model_RitzAT -2.691e-13 5.82e-13 -0.463 0.644 -1.41e-12 8.71e-13
Model_RitzLDi -0.0513 0.172 -0.297 0.766 -0.389 0.287
Model_RitzLXI 0.0873 0.221 0.394 0.693 -0.347 0.521
Model_RitzLXi -0.0871 0.192 -0.453 0.650 -0.464 0.290
Model_RitzVDI -0.0707 0.222 -0.319 0.750 -0.506 0.364
Model_RitzVDi -0.0407 0.161 -0.252 0.801 -0.357 0.275
Model_RitzVXI -0.0744 0.176 -0.424 0.672 -0.419 0.270
Model_RitzVXi -0.0806 0.170 -0.475 0.635 -0.413 0.252
Model_RitzZDi -0.0922 0.192 -0.479 0.632 -0.469 0.285
Model_RitzZXI 8.212e-13 5.89e-13 1.395 0.163 -3.33e-13 1.98e-12
Model_RitzZXi 0.0438 0.222 0.198 0.843 -0.391 0.478
Model_RoverDiscovery -49.2817 0.784 -62.827 0.000 -50.820 -47.744
Model_RoverFreelander -49.4741 0.779 -63.538 0.000 -51.001 -47.947
Model_RoverRange -48.8920 0.778 -62.862 0.000 -50.417 -47.367
Model_S-Class280 -2.875e-12 9.05e-13 -3.176 0.002 -4.65e-12 -1.1e-12
Model_S-Class320 -5.2182 0.172 -30.265 0.000 -5.556 -4.880
Model_S-ClassS -5.3763 0.176 -30.618 0.000 -5.721 -5.032
Model_S-CrossAlpha -8.134e-13 9.46e-13 -0.860 0.390 -2.67e-12 1.04e-12
Model_S-CrossDelta 0.3236 0.222 1.456 0.145 -0.112 0.759
Model_S-CrossZeta 7.269e-13 7.45e-13 0.975 0.330 -7.35e-13 2.19e-12
Model_S60D3 3.334e-13 6.58e-13 0.506 0.613 -9.57e-13 1.62e-12
Model_S60D4 -21.9055 0.362 -60.454 0.000 -22.616 -21.195
Model_S60D5 -21.9008 0.374 -58.601 0.000 -22.634 -21.168
Model_S802006-2013 -22.4719 0.372 -60.402 0.000 -23.201 -21.742
Model_S80D5 -2.341e-12 7.82e-13 -2.993 0.003 -3.88e-12 -8.07e-13
Model_SClass -5.4716 0.102 -53.520 0.000 -5.672 -5.271
Model_SCross 0.4283 0.173 2.479 0.013 0.090 0.767
Model_SL-ClassSL -5.0114 0.190 -26.389 0.000 -5.384 -4.639
Model_SLC43 -5.2052 0.153 -34.066 0.000 -5.505 -4.906
Model_SLK-Class55 -4.8603 0.191 -25.506 0.000 -5.234 -4.487
Model_SLK-ClassSLK -5.2134 0.149 -35.046 0.000 -5.505 -4.922
Model_SX4Green -0.1877 0.224 -0.838 0.402 -0.627 0.252
Model_SX4S 0.4418 0.171 2.591 0.010 0.107 0.776
Model_SX4VDI 0.1712 0.222 0.772 0.440 -0.264 0.606
Model_SX4Vxi 0.0339 0.167 0.203 0.839 -0.293 0.361
Model_SX4ZDI 0.1272 0.176 0.723 0.469 -0.217 0.472
Model_SX4ZXI 0.1013 0.170 0.596 0.551 -0.232 0.435
Model_SX4Zxi 0.0269 0.176 0.153 0.878 -0.317 0.371
Model_SafariDICOR -4.4303 0.176 -25.135 0.000 -4.776 -4.085
Model_SafariStorme -4.0820 0.113 -36.283 0.000 -4.303 -3.861
Model_Sail1.2 -9.2441 0.183 -50.444 0.000 -9.603 -8.885
Model_SailHatchback -9.5374 0.168 -56.691 0.000 -9.867 -9.208
Model_SailLT -9.6807 0.212 -45.563 0.000 -10.097 -9.264
Model_SantaFe -2.8409 0.089 -31.743 0.000 -3.016 -2.665
Model_SantroAT -3.6891 0.164 -22.455 0.000 -4.011 -3.367
Model_SantroD -3.9024 0.161 -24.238 0.000 -4.218 -3.587
Model_SantroDX 4.293e-13 9.89e-13 0.434 0.664 -1.51e-12 2.37e-12
Model_SantroGLS -3.9380 0.095 -41.662 0.000 -4.123 -3.753
Model_SantroGS -3.6835 0.123 -29.847 0.000 -3.925 -3.441
Model_SantroLP -3.5956 0.162 -22.174 0.000 -3.913 -3.278
Model_SantroLS -4.2427 0.164 -25.836 0.000 -4.565 -3.921
Model_SantroXing -3.9233 0.059 -66.173 0.000 -4.039 -3.807
Model_ScalaDiesel -9.7957 0.218 -44.930 0.000 -10.223 -9.368
Model_ScalaRxL -10.1425 0.187 -54.335 0.000 -10.508 -9.776
Model_Scorpio1.99 -4.0953 0.132 -31.114 0.000 -4.353 -3.837
Model_Scorpio2.6 -4.3486 0.098 -44.514 0.000 -4.540 -4.157
Model_Scorpio2009-2014 -4.1909 0.105 -39.730 0.000 -4.398 -3.984
Model_ScorpioDX -4.1549 0.167 -24.817 0.000 -4.483 -3.827
Model_ScorpioLX -4.3753 0.129 -33.908 0.000 -4.628 -4.122
Model_ScorpioS10 -3.9778 0.116 -34.373 0.000 -4.205 -3.751
Model_ScorpioS2 -1.983e-12 7.01e-13 -2.827 0.005 -3.36e-12 -6.08e-13
Model_ScorpioS4 -4.2234 0.131 -32.255 0.000 -4.480 -3.967
Model_ScorpioS6 -3.9975 0.116 -34.378 0.000 -4.226 -3.770
Model_ScorpioS8 -4.0309 0.133 -30.228 0.000 -4.292 -3.769
Model_ScorpioSLE -4.2456 0.099 -42.817 0.000 -4.440 -4.051
Model_ScorpioSLX 7.218e-13 6.18e-13 1.167 0.243 -4.91e-13 1.93e-12
Model_ScorpioVLX -4.1591 0.086 -48.565 0.000 -4.327 -3.991
Model_Siena1.2 -15.6515 0.276 -56.758 0.000 -16.192 -15.111
Model_Sonata2.4 -3.116e-13 9.35e-13 -0.333 0.739 -2.15e-12 1.52e-12
Model_SonataEmbera -3.0664 0.124 -24.821 0.000 -3.309 -2.824
Model_SonataGOLD -3.6291 0.161 -22.552 0.000 -3.945 -3.314
Model_SonataTransform 1.18e-12 5.49e-13 2.151 0.032 1.04e-13 2.26e-12
Model_Spark1.0 -9.6900 0.169 -57.406 0.000 -10.021 -9.359
Model_SsangyongRexton -3.9533 0.091 -43.240 0.000 -4.133 -3.774
Model_SumoDX -4.2895 0.197 -21.759 0.000 -4.676 -3.903
Model_SumoDelux -4.2354 0.175 -24.177 0.000 -4.579 -3.892
Model_SumoEX -4.6838 0.182 -25.702 0.000 -5.041 -4.327
Model_Sunny2011-2014 -13.3117 0.211 -63.134 0.000 -13.725 -12.898
Model_SunnyDiesel -13.1698 0.255 -51.587 0.000 -13.670 -12.669
Model_SunnyXE -4.512e-13 8.05e-13 -0.560 0.575 -2.03e-12 1.13e-12
Model_SunnyXL -12.9906 0.254 -51.168 0.000 -13.488 -12.493
Model_SunnyXV -13.1788 0.231 -57.144 0.000 -13.631 -12.727
Model_Superb1.8 -5.6058 0.122 -45.888 0.000 -5.845 -5.366
Model_Superb2.5 -4.392e-13 5.63e-13 -0.780 0.435 -1.54e-12 6.65e-13
Model_Superb2.8 -5.6891 0.138 -41.244 0.000 -5.960 -5.419
Model_Superb2009-2014 -4.9904 0.182 -27.432 0.000 -5.347 -4.634
Model_Superb3.6 2.376e-12 8.27e-13 2.873 0.004 7.54e-13 4e-12
Model_SuperbAmbition -5.5036 0.178 -30.850 0.000 -5.853 -5.154
Model_SuperbElegance -5.4992 0.102 -53.887 0.000 -5.699 -5.299
Model_SuperbL&K -5.0246 0.148 -33.986 0.000 -5.315 -4.735
Model_SuperbStyle -5.4082 0.124 -43.545 0.000 -5.652 -5.165
Model_Swift1.3 0.1239 0.167 0.741 0.459 -0.204 0.452
Model_SwiftAMT 0.0855 0.193 0.442 0.659 -0.294 0.465
Model_SwiftDDiS 0.1151 0.182 0.633 0.527 -0.242 0.472
Model_SwiftDzire 0.1869 0.158 1.180 0.238 -0.124 0.498
Model_SwiftLDI 0.0474 0.171 0.278 0.781 -0.287 0.382
Model_SwiftLXI 0.2370 0.181 1.309 0.191 -0.118 0.592
Model_SwiftLXi 0.0941 0.221 0.426 0.670 -0.339 0.527
Model_SwiftLdi 0.1085 0.173 0.628 0.530 -0.230 0.447
Model_SwiftLxi -0.1151 0.176 -0.656 0.512 -0.459 0.229
Model_SwiftRS 0.1850 0.222 0.835 0.404 -0.249 0.619
Model_SwiftVDI 0.1551 0.159 0.975 0.330 -0.157 0.467
Model_SwiftVDi -2.104e-12 5.37e-13 -3.918 0.000 -3.16e-12 -1.05e-12
Model_SwiftVVT 0.1283 0.193 0.666 0.506 -0.249 0.506
Model_SwiftVXI 0.0927 0.161 0.577 0.564 -0.223 0.408
Model_SwiftVXi 0.0176 0.182 0.097 0.923 -0.338 0.374
Model_SwiftVdi 0.1839 0.176 1.047 0.295 -0.160 0.528
Model_SwiftZDI 0.1386 0.222 0.625 0.532 -0.296 0.574
Model_SwiftZDi 0.2342 0.166 1.410 0.159 -0.092 0.560
Model_SwiftZXI 0.2081 0.172 1.208 0.227 -0.130 0.546
Model_TT2.0 -6.4847 0.190 -34.170 0.000 -6.857 -6.113
Model_TT40 -5.9077 0.182 -32.539 0.000 -6.264 -5.552
Model_TUV300 -4.5025 0.098 -46.041 0.000 -4.694 -4.311
Model_TaveraLS -9.3899 0.236 -39.789 0.000 -9.853 -8.927
Model_TaveraLT -8.8982 0.226 -39.349 0.000 -9.342 -8.455
Model_Teana230jM -1.144e-12 1.03e-12 -1.113 0.266 -3.16e-12 8.71e-13
Model_TeanaXV -12.8632 0.262 -49.188 0.000 -13.376 -12.351
Model_TerranoXL -13.0631 0.214 -60.928 0.000 -13.483 -12.643
Model_TerranoXV -13.0035 0.216 -60.286 0.000 -13.426 -12.581
Model_TharCRDe -4.2702 0.116 -36.950 0.000 -4.497 -4.044
Model_TharDI -4.5283 0.172 -26.377 0.000 -4.865 -4.192
Model_Tiago1.2 -4.7574 0.102 -46.745 0.000 -4.957 -4.558
Model_TiagoAMT 1.262e-12 5.72e-13 2.205 0.027 1.4e-13 2.38e-12
Model_TiagoWizz 3.105e-12 6.59e-13 4.714 0.000 1.81e-12 4.4e-12
Model_Tigor1.05 -4.4467 0.173 -25.706 0.000 -4.786 -4.108
Model_Tigor1.2 -4.6836 0.134 -34.861 0.000 -4.947 -4.420
Model_TigorXE -1.755e-12 1.03e-12 -1.698 0.090 -3.78e-12 2.72e-13
Model_Tiguan2.0 -6.3560 0.193 -32.863 0.000 -6.735 -5.977
Model_Tucson2.0 -2.5898 0.166 -15.626 0.000 -2.915 -2.265
Model_TucsonCRDi -2.7089 0.170 -15.939 0.000 -3.042 -2.376
Model_V40Cross -21.8287 0.372 -58.666 0.000 -22.558 -21.099
Model_V40D3 -21.9063 0.364 -60.262 0.000 -22.619 -21.194
Model_Vento1.2 5.416e-13 7.5e-13 0.722 0.470 -9.29e-13 2.01e-12
Model_Vento1.5 -7.3840 0.122 -60.486 0.000 -7.623 -7.145
Model_Vento1.6 -7.4029 0.128 -57.851 0.000 -7.654 -7.152
Model_Vento2013-2015 -7.4946 0.159 -47.015 0.000 -7.807 -7.182
Model_VentoDiesel -7.4081 0.119 -62.234 0.000 -7.641 -7.175
Model_VentoIPL -7.6005 0.156 -48.819 0.000 -7.906 -7.295
Model_VentoKonekt -7.2200 0.190 -37.930 0.000 -7.593 -6.847
Model_VentoMagnific -7.3943 0.190 -38.885 0.000 -7.767 -7.021
Model_VentoPetrol -7.4415 0.120 -62.139 0.000 -7.676 -7.207
Model_VentoSport -7.3029 0.158 -46.343 0.000 -7.612 -6.994
Model_VentoTSI 2.329e-12 8.15e-13 2.859 0.004 7.32e-13 3.93e-12
Model_VentureEX -4.7516 0.180 -26.436 0.000 -5.104 -4.399
Model_Verito1.5 -4.7821 0.118 -40.456 0.000 -5.014 -4.550
Model_Verna1.4 -3.3457 0.087 -38.623 0.000 -3.515 -3.176
Model_Verna1.6 -3.3032 0.062 -52.940 0.000 -3.426 -3.181
Model_VernaCRDi -3.4769 0.069 -50.095 0.000 -3.613 -3.341
Model_VernaSX -3.3096 0.083 -39.665 0.000 -3.473 -3.146
Model_VernaTransform -3.6204 0.081 -44.424 0.000 -3.780 -3.461
Model_VernaVTVT -3.2389 0.079 -40.793 0.000 -3.395 -3.083
Model_VernaXXi -3.6599 0.162 -22.549 0.000 -3.978 -3.342
Model_VernaXi -3.8194 0.163 -23.415 0.000 -4.139 -3.500
Model_VersaDX2 0.0337 0.228 0.148 0.883 -0.414 0.482
Model_VitaraBrezza 0.3896 0.161 2.423 0.015 0.074 0.705
Model_WR-VEdge -4.1562 0.168 -24.778 0.000 -4.485 -3.827
Model_WRVi-VTEC -3.9613 0.129 -30.816 0.000 -4.213 -3.709
Model_WagonR -0.1470 0.158 -0.930 0.352 -0.457 0.163
Model_X-TrailSLX -12.5613 0.233 -53.813 0.000 -13.019 -12.104
Model_X1M -10.0304 0.190 -52.674 0.000 -10.404 -9.657
Model_X1sDrive -10.2879 0.164 -62.667 0.000 -10.610 -9.966
Model_X1sDrive20d -10.3399 0.166 -62.214 0.000 -10.666 -10.014
Model_X1xDrive -9.9258 0.218 -45.568 0.000 -10.353 -9.499
Model_X3xDrive -9.8674 0.177 -55.888 0.000 -10.214 -9.521
Model_X3xDrive20d -10.0053 0.171 -58.548 0.000 -10.340 -9.670
Model_X3xDrive30d -9.9052 0.218 -45.362 0.000 -10.333 -9.477
Model_X52014-2019 -9.6152 0.183 -52.491 0.000 -9.974 -9.256
Model_X53.0d -9.8939 0.177 -55.899 0.000 -10.241 -9.547
Model_X5X5 -9.4532 0.191 -49.367 0.000 -9.829 -9.078
Model_X5xDrive -9.6457 0.168 -57.344 0.000 -9.975 -9.316
Model_X6xDrive -9.3552 0.182 -51.413 0.000 -9.712 -8.998
Model_X6xDrive30d -9.5430 0.189 -50.375 0.000 -9.914 -9.172
Model_XC60D4 -21.8700 0.362 -60.497 0.000 -22.579 -21.161
Model_XC60D5 -21.8938 0.356 -61.539 0.000 -22.591 -21.196
Model_XC902007-2015 -21.6101 0.381 -56.792 0.000 -22.356 -20.864
Model_XE2.0L -6.131e-13 4.43e-13 -1.384 0.166 -1.48e-12 2.56e-13
Model_XEPortfolio -4.061e-13 2.33e-13 -1.743 0.081 -8.63e-13 5.06e-14
Model_XF2.0 -19.7848 0.346 -57.132 0.000 -20.464 -19.106
Model_XF2.2 -19.8963 0.318 -62.619 0.000 -20.519 -19.273
Model_XF3.0 -20.1215 0.318 -63.354 0.000 -20.744 -19.499
Model_XFAero -19.5800 0.339 -57.700 0.000 -20.245 -18.915
Model_XFDiesel -20.0804 0.322 -62.355 0.000 -20.712 -19.449
Model_XJ2.0L -19.2596 0.348 -55.405 0.000 -19.941 -18.578
Model_XJ3.0L -19.4028 0.326 -59.570 0.000 -20.041 -18.764
Model_XJ5.0 -19.5345 0.344 -56.745 0.000 -20.210 -18.860
Model_XUV300W8 -3.9942 0.171 -23.300 0.000 -4.330 -3.658
Model_XUV500AT -3.9524 0.093 -42.556 0.000 -4.134 -3.770
Model_XUV500W10 -3.8637 0.087 -44.164 0.000 -4.035 -3.692
Model_XUV500W4 -4.1634 0.116 -36.038 0.000 -4.390 -3.937
Model_XUV500W6 -4.0692 0.090 -45.260 0.000 -4.245 -3.893
Model_XUV500W7 -4.1544 0.171 -24.323 0.000 -4.489 -3.820
Model_XUV500W8 -4.0323 0.077 -52.450 0.000 -4.183 -3.882
Model_XUV500W9 -4e-13 3.43e-13 -1.167 0.243 -1.07e-12 2.72e-13
Model_Xcent1.1 -3.6344 0.075 -48.770 0.000 -3.781 -3.488
Model_Xcent1.2 -3.6668 0.068 -53.637 0.000 -3.801 -3.533
Model_XenonXT -4.5936 0.122 -37.769 0.000 -4.832 -4.355
Model_XyloD2 -5.0120 0.134 -37.493 0.000 -5.274 -4.750
Model_XyloD4 -4.6961 0.104 -45.327 0.000 -4.899 -4.493
Model_XyloE2 -4.7978 0.169 -28.452 0.000 -5.128 -4.467
Model_XyloE4 -4.5368 0.131 -34.708 0.000 -4.793 -4.281
Model_XyloE8 -4.5919 0.118 -39.042 0.000 -4.823 -4.361
Model_XyloH4 -4.3561 0.172 -25.343 0.000 -4.693 -4.019
Model_YetiAmbition -5.5909 0.130 -43.096 0.000 -5.845 -5.337
Model_YetiElegance -5.4991 0.130 -42.151 0.000 -5.755 -5.243
Model_Z42009-2013 -9.5870 0.194 -49.443 0.000 -9.967 -9.207
Model_ZenEstilo -0.3639 0.166 -2.190 0.029 -0.690 -0.038
Model_ZenLX -0.3016 0.192 -1.574 0.116 -0.677 0.074
Model_ZenLXI 2.961e-17 1.73e-17 1.714 0.087 -4.26e-18 6.35e-17
Model_ZenLXi -0.2961 0.181 -1.639 0.101 -0.650 0.058
Model_ZenVX 0.0019 0.221 0.009 0.993 -0.431 0.435
Model_ZenVXI -0.2790 0.181 -1.541 0.123 -0.634 0.076
Model_ZenVXi -0.1740 0.221 -0.788 0.431 -0.607 0.259
Model_ZestQuadrajet -4.6607 0.113 -41.395 0.000 -4.881 -4.440
Model_ZestRevotron -4.5122 0.099 -45.371 0.000 -4.707 -4.317
Model_i10Asta -3.5636 0.090 -39.566 0.000 -3.740 -3.387
Model_i10Era -3.7607 0.065 -57.629 0.000 -3.889 -3.633
Model_i10Magna -3.6895 0.061 -60.302 0.000 -3.810 -3.570
Model_i10Magna(O) -3.6090 0.162 -22.231 0.000 -3.927 -3.291
Model_i10Sportz -3.7076 0.061 -60.603 0.000 -3.828 -3.588
Model_i201.2 -3.5059 0.064 -55.086 0.000 -3.631 -3.381
Model_i201.4 -3.5404 0.067 -52.771 0.000 -3.672 -3.409
Model_i202015-2017 -3.5293 0.080 -44.362 0.000 -3.685 -3.373
Model_i20Active -3.4474 0.078 -44.056 0.000 -3.601 -3.294
Model_i20Asta -3.3911 0.065 -51.836 0.000 -3.519 -3.263
Model_i20Diesel -3.4774 0.166 -20.945 0.000 -3.803 -3.152
Model_i20Era -3.6378 0.163 -22.263 0.000 -3.958 -3.317
Model_i20Magna -3.5601 0.066 -53.850 0.000 -3.690 -3.430
Model_i20Sportz -3.4857 0.064 -54.491 0.000 -3.611 -3.360
Model_redi-GOS -28.4620 0.466 -61.044 0.000 -29.376 -27.548
Model_redi-GOT -28.4673 0.450 -63.209 0.000 -29.350 -27.584
==============================================================================
Omnibus: 729.914 Durbin-Watson: 1.976
Prob(Omnibus): 0.000 Jarque-Bera (JB): 11194.465
Skew: -0.350 Prob(JB): 0.00
Kurtosis: 10.957 Cond. No. 1.68e+21
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.02e-32. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
get_model_score_adjusted_R2(olsmodel2)
Adjusted R2 on training set : 0.9554174298292273 Adjusted R2 on test set : -2.9916319917660635e+83 RMSE on training set : 2.1857914088403443 RMSE on test set : 4.4582786612795244e+42
[0.9554174298292273, -2.9916319917660635e+83, 2.1857914088403443, 4.4582786612795244e+42]
pval_filter = olsmod['pval']<= 0.05
imp_vars = olsmod[pval_filter].index.tolist()
# We are going to get overall varaibles (un-one-hot encoded varables) from categorical varaibles
sig_var = []
for col in imp_vars:
if '' in col:
first_part = col.split('_')[0]
for c in cars_data.columns:
if first_part in c and c not in sig_var :
sig_var.append(c)
start = '\033[1m'
end = '\033[95m'
print(start+ 'Most overall significant categorical variables of LINEAR REGRESSION are ' +end,':\n', sig_var)
Most overall significant categorical variables of LINEAR REGRESSION are : ['Year', 'Mileage', 'Power', 'kilometers_driven_log', 'Location', 'Fuel_Type', 'Transmission', 'Owner_Type', 'Brand', 'Model']
# Using Adjusted R2 resulted in a bad model, just like the R2. We will drop both OLS models.
Build Ridge / Lasso Regression similar to Linear Regression:
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html
Ridge
# Import Ridge/ Lasso Regression from sklearn
from sklearn.linear_model import Ridge, Lasso
# Create a Ridge regression model
ridge = Ridge(alpha=1.0)
# Fit Ridge regression model
ridge.fit(X_train,y_train['price_log'])
Ridge()
# Get score of the model
ridge_score = get_model_score(ridge)
R-square on training set : 0.9495426277022899 R-square on test set : 0.9082463599330018 RMSE on training set : 2.508411560689403 RMSE on test set : 3.382079182882414
import numpy as np
# Train the model
ridge.fit(X_train, y_train)
# Get the coefficients
coefficients = ridge.coef_
# Get the absolute values of the coefficients
coef_abs = np.abs(coefficients)
# Get the indices of the k largest absolute values
k = 7
most_important = np.argpartition(coef_abs, -k)[-k:]
# Get the corresponding feature names
most_important_features = [X_train.columns[i] for i in most_important]
#**Observations from results: _____**
#RIDGE
#a) R-square on training set: 0.9495426277022899 and R-square on test set: 0.9082463599330018 are indicating
#that the model is performing well on both the training and test sets. A high R-squared value (closer to 1)
#indicates that the model is explaining a large proportion of the variance in the data.
#The R-squared value for the test set is lower than the R-squared value for the training set, which is expected.
#In general, the test set score should be lower than the training set score #because the model has not seen the
#test data before.
#b) RMSE on training set: 2.508411560689403 and RMSE on test set: 3.382079182882414 are indicating the error of the
#model on #both training and test set. RMSE (Root Mean Squared Error) is a measure of the difference between
#the predicted and actual #values. The lower the RMSE, the better the model.
Lasso
#create lasso regression model
lasso=Lasso(alpha=1.0)
#Fit Lasso regression model
lasso.fit(X_train,y_train['price_log'])
Lasso()
# Get score of the model
lasso_score = get_model_score(lasso)
R-square on training set : -3.4840811654275816 R-square on test set : 0.2077302968471606 RMSE on training set : 24.138711598481414 RMSE on test set : 9.43820300687797
Observations from results: _
SUMMARY (For score details, see individual models)
LINEAR REGRESSION Overall, all the models perform badly. The models are overfitting the training data and is not generalizing well to unseen data. Linear regression from scikit-learn and OLS from statsmodels are different implementations of linear regression, which resulted in different results. Scikit-learn uses the Ordinary Least Squares (OLS) method as the default implementation of linear regression, but statsmodels provides more options and detailed output for OLS models, including hypothesis testing, confidence intervals, and various statistical measures. The get_model_score function returned different values for both models because the underlying implementation is different.
RIDGE Values are still high indicating that we still need to improve the model.
LASSO Giving extremely bad results. Worst than all methods so far.
# Import Decision tree for Regression from sklearn
from sklearn.tree import DecisionTreeRegressor
# Create a decision tree regression model, use random_state = 1
dtree = DecisionTreeRegressor(random_state = 1)
# Fit decision tree regression model
dtree.fit(X_train, y_train['price_log'])
DecisionTreeRegressor(random_state=1)
# Get score of the model
Dtree_model = get_model_score(dtree)
R-square on training set : 0.9999991628779447 R-square on test set : 0.8046161304354573 RMSE on training set : 0.010429698294569224 RMSE on test set : 4.687023633168094
Observations from results: _
The model has a very high R-squared on the training set (0.9999991628779447) which indicates that the model is fitting the training data very well. However, the R-squared on the test set (0.8046161304354573) is significantly lower, indicating that the model is overfitting to the training data and not generalizing well to new, unseen data.
The RMSE on the training set (0.010429698294569224) is also very low which confirms that the model is fitting the training data very well, however, the RMSE on the test set (4.687023633168094) is much higher, which indicates that the model is not performing well on unseen data.
Overall, the model is overfitting to the training data.
Print the importance of features in the tree building. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.
print(pd.DataFrame(dtree.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
Imp Power 0.629740 Year 0.231035 Engine 0.030364 kilometers_driven_log 0.015019 Mileage 0.010386 ... ... Model_PulsePetrol 0.000000 Model_CLA200 0.000000 Model_PoloIPL 0.000000 Model_CLS-Class2006-2010 0.000000 Model_Q735 0.000000 [738 rows x 1 columns]
#plot graph of feature importances for Decision Tree for better analysis
plt.figure(figsize = (12,8))
feat_importances = pd.Series(dtree.feature_importances_, index=X.columns)
feat_importances.nlargest(20).plot(kind='barh')
plt.show()
Observations and insights: _
Gini importance is a measure of the importance of each feature (predictor variable) in a decision tree or random forest model. It is calculated by measuring the decrease in the Gini impurity of the node when a feature is used to split the data, and averaging the results over all of the trees in the forest. Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it were randomly labeled according to the distribution of labels in the subset. Lower Gini impurity indicates a more pure subset of the data, and therefore a feature with a high Gini importance is considered to be more important in the prediction of the target variable.
The feature with the highest score, "Power" in this case, is considered to be the most important feature in the model. The feature with the second highest score, "Year" in this case, is considered to be the second most important feature and so on.
The low Gini importance values of "Engine", "kilometers_driven_log" and "Mileage" indicate that these features may not be as important in the prediction of the target variable as the other features.
# Import Randomforest for Regression from sklearn
from sklearn.ensemble import RandomForestRegressor
# Create a Randomforest regression model
clf = RandomForestRegressor(n_estimators=100)
# Fit Randomforest regression model
clf.fit(X_train, y_train['price_log'])
RandomForestRegressor()
# Get score of the model
clf_model = get_model_score(clf)
R-square on training set : 0.9834134401371122 R-square on test set : 0.8758389940624158 RMSE on training set : 1.468099577409237 RMSE on test set : 3.736331500042392
Observations and insights: _
The R-squared values on the training and test sets are high and low, respectively. This suggests that the model is overfitting to the training data and not generalizing well to the test data.
The RMSE values on the training and test sets are also low and high, respectively, further confirming overfitting.
Feature Importance
# Print important features similar to decision trees
print(pd.DataFrame(clf.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
Imp Power 0.623214 Year 0.227837 Engine 0.030693 kilometers_driven_log 0.016297 Mileage 0.013841 ... ... Model_Nexon1.2 0.000000 Model_CR-V2.0 0.000000 Model_NuvoSportN8 0.000000 Model_VentoTSI 0.000000 Model_Figo1.2P 0.000000 [738 rows x 1 columns]
#plot graph of feature importances for Random Forest for better analysis
plt.figure(figsize = (12,8))
feat_importances = pd.Series(clf.feature_importances_, index=X.columns)
feat_importances.nlargest(20).plot(kind='barh')
plt.show()
Observations and insights: _
Not much difference in values between Decision Tree and Random Forest. Both indicate overfitting.
#To tune a decision tree, we use the following parameters:
# max_depth: The maximum depth of the tree. Increasing this value will make the model more complex,
# while decreasing it will make the model less complex.
# min_samples_split: The minimum number of samples required to split an internal node.
# Increasing this value will make the model less complex, as it will require more samples to split a node.
# min_samples_leaf: The minimum number of samples required to be at a leaf node.
# Increasing this value will make the model less complex,
# as it will require more samples to be present at a leaf node.
# max_features controls the number of features that are considered when splitting a node.
# set max_features to "auto" - DEPRICATED, so can't use
# set max_features to "sqrt", the algorithm will select the number of features equal to the square root of the total number of features.
# set max_features to "log2", the algorithm will select the number of features equal to log2(total number of features).
# set max_features is set to None, then all features will be considered when splitting a node.
#importing required libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import recall_score, make_scorer
# Choose the type of estimator
dtree_tuned = DecisionTreeRegressor(random_state = 1)
# Grid of parameters to choose from
# Check documentation for all the parametrs that the model takes and play with those
parameters = {'splitter':["best","random"],
# 'max_depth': [1, 3, 5, 7, 9, 11, 12, 15],
'min_samples_leaf': [5, 10, 20, 25],
# 'min_weight_fraction_leaf': [.5],
'max_features': [None],
}
# Type of scoring used to compare parameter combinations
scorer = 'neg_mean_squared_error'
# Run the grid search
grid_obj = GridSearchCV(estimator=dtree_tuned,param_grid=parameters, cv=10, verbose=1, scoring = scorer)
grid_obj = grid_obj.fit(X_train,y_train)
# Set the model to the best combination of parameters
dtree_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data
dtree_tuned.fit(X_train,y_train['price_log'])
Fitting 10 folds for each of 8 candidates, totalling 80 fits
DecisionTreeRegressor(min_samples_leaf=10, random_state=1)
# Get score of tuned model
dtree_tuned_model = get_model_score(dtree_tuned)
R-square on training set : 0.8892250655242453 R-square on test set : 0.7862076836289769 RMSE on training set : 3.794006794630731 RMSE on test set : 4.90285259664256
Observations and insights: _
-Working with min_samples_leaf resulted in negative R-squares and high RMSE. So I stopped tuning that parameter. -Increasing the number of features to consider when looking for the best split resulted in better values, but still showed overfitting. -I set max_features to None and that helped the model, but still shows overfitting. -Tuning these feature was heavy computationally and time wise. If I had more time, I would try other combinations.
Feature Importance
# Print important features of tuned decision tree similar to decision trees
print(pd.DataFrame(dtree_tuned.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
Imp Power 0.678275 Year 0.245198 Engine 0.024863 Mileage 0.010325 kilometers_driven_log 0.008252 ... ... Model_Endeavour3.0L 0.000000 Model_Endeavour3.2 0.000000 Model_Endeavour4x2 0.000000 Model_EndeavourHurricane 0.000000 Model_redi-GOT 0.000000 [738 rows x 1 columns]
#plot graph of feature importances for Tuned Decision Tree for better analysis
plt.figure(figsize = (12,8))
feat_importances = pd.Series(dtree_tuned.feature_importances_, index=X.columns)
feat_importances.nlargest(20).plot(kind='barh')
plt.show()
Feature Importance: Power is the most important variable for Price, followed by Year, Engine and Mileage.
#Some Important Parameters
# n_estimatorsint, default=100 --> The number of trees in the forest.Default=100
# max_depthint, default=None --> The maximum depth of the tree. If None, then nodes are expanded
# until all leaves are pure or until all leaves contain less than min_samples_split samples.
# min_samples_splitint or float, default=2 --> The minimum number of samples required to split an internal node
# min_samples_leafint or float, default=1 --> The minimum number of samples required to be at a leaf node.
# min_weight_fraction_leaffloat, default=0.0 --> The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.
# max_features{“sqrt”, “log2”, None}, int or float, default=1.0 The number of features to consider when looking for the best split
# max_leaf_nodesint, default=None --> Grow trees with max_leaf_nodes in best-first fashion.
# Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.
# min_impurity_decreasefloat, default=0.0 --> A node will be split if this split induces a decrease of the impurity greater than or equal to this value
# max_samplesint or float, default=None --> If bootstrap is True, the number of samples to draw from X to train each base estimator.
# Choose the type of Regressor
randomforest_tuned = RandomForestRegressor(random_state=1)
# Define the parameters for Grid to choose from
parameters={'max_depth': [1, 2, 3, 5, 7, 9, 10, 11, 12],
'min_samples_leaf': [5, 10, 20, 25],
'max_features': [None]
}
# Check documentation for all the parametrs that the model takes and play with those: see above
# Type of scoring used to compare parameter combinations
scorer = metrics.make_scorer(metrics.mean_absolute_error, greater_is_better=False)
# Create classifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
# Define the parameter grid
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [None, 5, 10],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4],
'max_features': ['auto', 'sqrt', 'log2']
}
# Run the grid search
grid_obj = GridSearchCV(estimator=randomforest_tuned,param_grid=parameters,
cv=10, verbose=1, scoring = scorer)
grid_obj = grid_obj.fit(X_train,y_train)
# Set the model to the best combination of parameters
randomforest_tuned=grid_obj.best_estimator_
# Fit the best algorithm to the data
randomforest_tuned.fit(X_train,y_train['price_log'])
Fitting 10 folds for each of 36 candidates, totalling 360 fits
RandomForestRegressor(max_depth=12, max_features=None, min_samples_leaf=5,
random_state=1)
# Get score of tuned model
randomforest_tuned_model = get_model_score(randomforest_tuned)
R-square on training set : 0.9209426273014948 R-square on test set : 0.8353974692885664 RMSE on training set : 3.2051513501650515 RMSE on test set : 4.3020063205316985
# Print important features of tuned decision tree similar to decision trees
print(pd.DataFrame(randomforest_tuned.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
Imp Power 0.656538 Year 0.240443 Engine 0.030342 Mileage 0.014148 kilometers_driven_log 0.012575 ... ... Model_Fiesta1.5 0.000000 Model_Fiesta1.6 0.000000 Model_FiestaClassic 0.000000 Model_FiestaDiesel 0.000000 Model_redi-GOT 0.000000 [738 rows x 1 columns]
#plot graph of feature importances for Random Forest for better visualization
plt.figure(figsize = (12,8))
feat_importances = pd.Series(randomforest_tuned.feature_importances_, index=X.columns)
feat_importances.nlargest(20).plot(kind='barh')
plt.show()
Observations and insights: _ -Overall the model looks good but is still overfitting. -Tuning these feature was heavy computationally and time wise. If I had more time, I would try other combinations.
Feature Importance: Power is the most important variable for Price, followed by Year, Engine and Mileage.
# Create KNN Model
from sklearn.neighbors import KNeighborsRegressor
knn= KNeighborsRegressor()
knn.fit(X_train, y_train["price_log"])
get_model_score(knn)
R-square on training set : 0.8924922061501867 R-square on test set : 0.7860474214077737 RMSE on training set : 3.7376387902590227 RMSE on test set : 4.904689881687676
[0.8924922061501867, 0.7860474214077737, 3.7376387902590227, 4.904689881687676]
knn_model = get_model_score(knn)
R-square on training set : 0.8924922061501867 R-square on test set : 0.7860474214077737 RMSE on training set : 3.7376387902590227 RMSE on test set : 4.904689881687676
In a non-parametric model such as KNeighborsRegressor, the feature importances cannot be determined as easily as in a parametric model like linear regression. However, there are some methods we can use to get an understanding of which features are affecting the target variable:
Feature Selection: We can use feature selection techniques like Recursive Feature Elimination (RFE) or SelectFromModel to find the most important features.
Correlation: We can calculate the correlation between the features and the target variable and select the features with the highest correlation.
Permutation Importance: We can use permutation importance to determine the feature importances by randomly shuffling the values of a single feature and measuring the impact on the model's performance.
These are just some methods to understand the feature importances in a non-parametric model like KNeighborsRegressor. Note that these methods may not be as interpretable as the coefficients in a linear regression model, but they can still provide valuable insights into the features that are affecting the target variable.
import os
import xgboost
from xgboost import XGBRegressor
xgb = xgboost.XGBRegressor()
xgb.fit(X_train, y_train["price_log"])
get_model_score(xgb)
R-square on training set : 0.979436780937607 R-square on test set : 0.9036778306183256 RMSE on training set : 1.6346429452129905 RMSE on test set : 3.290909336003646
[0.979436780937607, 0.9036778306183256, 1.6346429452129905, 3.290909336003646]
xgb_model = get_model_score(xgb)
R-square on training set : 0.979436780937607 R-square on test set : 0.9036778306183256 RMSE on training set : 1.6346429452129905 RMSE on test set : 3.290909336003646
# Print important features of xgb
print(pd.DataFrame(xgb.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False))
Imp Power 0.233674 Transmission_Manual 0.091888 Fuel_Type_Diesel 0.042929 Engine 0.037026 Year 0.034153 ... ... Model_FiestaTitanium 0.000000 Model_FiestaEXi 0.000000 Model_Fiesta1.6 0.000000 Model_Fiesta1.5 0.000000 Model_redi-GOT 0.000000 [738 rows x 1 columns]
plt.figure(figsize = (12,8))
feat_importances = pd.Series(xgb.feature_importances_, index=X.columns)
feat_importances.nlargest(20).plot(kind='barh')
plt.show()
# get original features
original_features = list(X_train.columns)
# exclude dummy variables
dummy_features = [col for col in X_train.columns if col not in original_features]
dummy_indices = [i for i, feature in enumerate(X_train.columns) if feature in dummy_features]
# compute feature importances excluding dummy variables
importance = xgb.feature_importances_
importance[dummy_indices] = 0
# plot the top 7 feature importances of original features only
plt.figure(figsize = (12,8))
feat_importances = pd.Series(importance, index=X_train.columns)
feat_importances = feat_importances[original_features]
feat_importances = feat_importances.sort_values(ascending=False)
feat_importances[:7].plot(kind='barh')
plt.show()
# Hyperparameter tuning for XGBoost
# from xgboost import XGBRegressor - done previously for xgboost
# from sklearn.model_selection import GridSearchCV - done for random forest
# Define the parameters to be tuned
parameters_grid_xgb = {'learning_rate': [0.1, 0.01, 0.001],
'max_depth': [3, 5, 7],
'subsample': [0.6, 0.8, 1.0],
'gblinear': ['gblinear'],
'random_state' : [1],
'objective': ["reg:squarederror"],
'base_score': [0.2, 0.3, 0.5, 0.6]
}
#parameters_grid_xgb = {'n_estimators': [100, 300, 500],
# 'learning_rate': [0.1, 0.01, 0.001],
# 'max_depth': [3, 5, 7],
# 'subsample': [0.6, 0.8, 1.0]}
# Create the grid search object
xgb_tuned = XGBRegressor()
grid_search = GridSearchCV(xgb_tuned, parameters_grid_xgb, cv=5, n_jobs=-1, verbose=2)
# Fit the grid search to the data
grid_search.fit(X_train, y_train["price_log"])
# Train a new XGBoost model with the best hyperparameters
xgb_tuned = xgboost.XGBRegressor(max_depth=grid_search.best_params_['max_depth'],
learning_rate=grid_search.best_params_['learning_rate'])
xgb_tuned.fit(X_train, y_train["price_log"])
# Print the best parameters and the best score
print("Best parameters: ", grid_search.best_params_)
print("Best score: ", grid_search.best_score_)
Fitting 5 folds for each of 108 candidates, totalling 540 fits
[12:57:51] WARNING: C:/buildkite-agent/builds/buildkite-windows-cpu-autoscaling-group-i-08de971ced8a8cdc6-1/xgboost/xgboost-ci-windows/src/learner.cc:767:
Parameters: { "gblinear" } are not used.
Best parameters: {'base_score': 0.2, 'gblinear': 'gblinear', 'learning_rate': 0.1, 'max_depth': 7, 'objective': 'reg:squarederror', 'random_state': 1, 'subsample': 0.6}
Best score: 0.9437869065135688
xgb_tuned_model = get_model_score(xgb_tuned)
R-square on training set : 0.9753569578882522 R-square on test set : 0.900043951077762 RMSE on training set : 1.7894703752028756 RMSE on test set : 3.3524115677680077
print(xgb_tuned.feature_importances_)
[4.82342169e-02 3.04709561e-03 2.69879419e-02 2.22238347e-01 4.91382927e-03 2.49591423e-03 4.70170705e-03 1.37657986e-03 4.91037639e-03 1.67038862e-03 5.50906779e-03 2.65574083e-03 1.27263251e-03 7.44793704e-03 1.92655041e-03 2.90573016e-03 1.81476008e-02 0.00000000e+00 1.58035532e-02 4.96353023e-02 1.48158043e-03 2.08656117e-03 4.32528974e-03 6.34489348e-03 1.12624569e-02 0.00000000e+00 6.01363275e-03 9.86020779e-04 3.19298799e-03 0.00000000e+00 3.20846494e-03 1.80704109e-02 5.44862822e-03 0.00000000e+00 2.58290302e-03 1.68080023e-03 0.00000000e+00 1.30774146e-02 1.46347173e-02 2.51906924e-03 1.25941569e-02 1.33351181e-02 6.44829241e-04 2.77831568e-03 2.87809037e-03 3.77682899e-03 1.17079755e-02 0.00000000e+00 1.46692265e-02 7.55658373e-03 7.60357082e-03 3.30722588e-03 0.00000000e+00 3.23502510e-03 3.90503160e-03 0.00000000e+00 1.39414729e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 9.67784412e-03 2.74982816e-03 0.00000000e+00 0.00000000e+00 2.30923691e-03 6.73454662e-04 5.18884801e-04 0.00000000e+00 0.00000000e+00 6.30957715e-04 2.56602117e-03 2.03107629e-04 0.00000000e+00 1.23444432e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.29360070e-03 1.19749375e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 5.75470319e-03 0.00000000e+00 0.00000000e+00 2.55154795e-03 0.00000000e+00 0.00000000e+00 2.82438501e-04 6.69653527e-05 0.00000000e+00 2.32550967e-03 0.00000000e+00 0.00000000e+00 4.99604130e-03 1.43472548e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.10884127e-03 0.00000000e+00 5.44989074e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.51894956e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.81735586e-03 7.10084569e-05 1.00438215e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.82172789e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 3.95046169e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.26213061e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 5.28310891e-04 1.00722631e-04 0.00000000e+00 0.00000000e+00 1.03136653e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 8.90382973e-04 2.81361081e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.96935912e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 7.47300265e-03 0.00000000e+00 0.00000000e+00 8.63473222e-04 1.85798306e-03 1.76364940e-03 0.00000000e+00 0.00000000e+00 2.28682905e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 7.75345718e-04 0.00000000e+00 3.24704306e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 3.59659921e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.00537872e-03 1.24787390e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00 6.75287098e-04 2.32976628e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 3.85230035e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.13905477e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 7.66038662e-04 2.98682571e-04 0.00000000e+00 0.00000000e+00 3.70501366e-05 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.48410629e-03 0.00000000e+00 7.42063159e-04 4.01668996e-03 0.00000000e+00 0.00000000e+00 1.39436114e-03 2.28655594e-03 1.89668022e-03 0.00000000e+00 1.62448827e-03 1.18555792e-03 0.00000000e+00 9.60857491e-04 3.08321207e-03 4.04560589e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.36131479e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 6.45984430e-04 1.34100218e-03 1.91785907e-03 7.30057072e-04 0.00000000e+00 1.39662554e-03 9.38368321e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 3.07860924e-03 2.53717299e-03 0.00000000e+00 0.00000000e+00 1.05796766e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.40830374e-03 0.00000000e+00 0.00000000e+00 9.51308990e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 3.35676712e-03 2.46688724e-03 1.01643393e-03 0.00000000e+00 4.00162907e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.60315854e-03 0.00000000e+00 0.00000000e+00 8.83668312e-04 0.00000000e+00 0.00000000e+00 1.17737136e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.84921781e-03 0.00000000e+00 1.61317724e-03 1.47309329e-03 0.00000000e+00 4.56564630e-05 1.61031948e-03 0.00000000e+00 0.00000000e+00 6.90729183e-04 0.00000000e+00 0.00000000e+00 1.49398169e-03 0.00000000e+00 5.19098539e-04 0.00000000e+00 0.00000000e+00 3.19659058e-03 2.21338160e-02 1.75125932e-03 0.00000000e+00 8.48362542e-05 0.00000000e+00 0.00000000e+00 3.14793061e-03 3.31128744e-04 3.52563336e-03 5.20016765e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.54570665e-03 1.43168843e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 3.60956241e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.55030226e-03 4.71756735e-04 1.16079347e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.29868665e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.24912025e-04 2.26553320e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.55203433e-03 0.00000000e+00 0.00000000e+00 4.01135476e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 3.03552975e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 4.83057083e-04 0.00000000e+00 2.39945762e-03 0.00000000e+00 0.00000000e+00 1.79978480e-04 0.00000000e+00 0.00000000e+00 4.49185632e-03 2.81818473e-04 3.24843451e-04 0.00000000e+00 1.06444373e-03 4.38893773e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 4.56465859e-05 0.00000000e+00 2.07749358e-03 1.52391335e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.42667186e-03 0.00000000e+00 2.54524848e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 8.22352595e-04 0.00000000e+00 1.31638569e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 5.77138271e-04 0.00000000e+00 8.10846541e-05 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.76873372e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 3.23613902e-04 0.00000000e+00 0.00000000e+00 7.83919764e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.77468476e-03 9.57749435e-04 1.24305312e-03 1.62725733e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.08214340e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.09035097e-05 0.00000000e+00 0.00000000e+00 1.72831630e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 8.84039211e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.90723955e-03 1.51019916e-03 1.04899157e-03 1.22241059e-03 0.00000000e+00 0.00000000e+00 2.76545499e-04 0.00000000e+00 6.68147160e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 3.40018608e-03 0.00000000e+00 5.53837651e-03 0.00000000e+00 9.96899791e-04 0.00000000e+00 0.00000000e+00 6.72845170e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 4.74646949e-04 6.61463244e-04 5.01455856e-04 0.00000000e+00 1.12606329e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.85771275e-03 9.98617732e-04 0.00000000e+00 0.00000000e+00 1.44963630e-03 0.00000000e+00 8.87809496e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 4.11549292e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.43721437e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 9.76809068e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 9.80447046e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.51952624e-03 0.00000000e+00 0.00000000e+00 1.24181178e-03 0.00000000e+00 2.24296725e-03 1.52327179e-03 0.00000000e+00 0.00000000e+00 1.85737829e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 7.04878243e-04 3.22350999e-03 0.00000000e+00 1.49491534e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.16664683e-03 0.00000000e+00 7.29298044e-04 8.62892950e-04 0.00000000e+00 0.00000000e+00 3.19078070e-04 0.00000000e+00 4.23092657e-04 0.00000000e+00 0.00000000e+00 1.15009432e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 3.21566855e-04 0.00000000e+00 0.00000000e+00 9.21438041e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 6.52769255e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 4.43862891e-03 1.71486195e-03 0.00000000e+00 5.85608999e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 9.50066315e-04 0.00000000e+00 0.00000000e+00 1.18819252e-03 0.00000000e+00 0.00000000e+00 4.95065062e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 8.30489909e-04 0.00000000e+00 0.00000000e+00 1.66947639e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 7.16061913e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 4.12203575e-04 2.16467259e-03 0.00000000e+00 1.36228057e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 6.59508747e-04 1.04799413e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.73426645e-03 1.43703620e-03 1.55118993e-03 3.12204729e-03 0.00000000e+00 7.89886224e-04 0.00000000e+00 0.00000000e+00 0.00000000e+00 8.46484967e-04 1.18166977e-03 1.39140326e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.87603092e-04 1.06549554e-03 0.00000000e+00 0.00000000e+00 5.14614535e-03 0.00000000e+00 5.70750854e-05 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.26380792e-03 6.66811713e-04 0.00000000e+00 5.41912916e-04 1.59671612e-03 0.00000000e+00 0.00000000e+00 1.76820485e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00]
plt.figure(figsize = (12,8))
feat_importances = pd.Series(xgb_tuned.feature_importances_, index=X.columns)
feat_importances.nlargest(20).plot(kind='barh')
plt.show()
from sklearn.ensemble import AdaBoostRegressor
from sklearn.datasets import make_regression
# define model
ada_regr = AdaBoostRegressor (random_state= 0)
# Fitting the model
ada_regr.fit(X_train, y_train['price_log'])
AdaBoostRegressor(n_estimators=100, random_state= 0)
# Model Performance on the test data
ada_score = get_model_score(ada_regr)
R-square on training set : 0.697458964788525 R-square on test set : 0.6681917030759577 RMSE on training set : 6.270028252038547 RMSE on test set : 6.107963185948226
#plot graph of feature importances for better visualization
plt.figure(figsize = (12,8))
feat_importances = pd.Series(ada_regr.feature_importances_, index=X.columns)
feat_importances.nlargest(20).plot(kind='barh')
plt.show()
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import make_regression
# define model
gradient_reg = GradientBoostingRegressor(random_state=0)
# Fitting the model
gradient_reg.fit(X_train, y_train['price_log'])
GradientBoostingRegressor(random_state=0)
# Model Performance on the test data
gradient_score = get_model_score(gradient_reg)
R-square on training set : 0.916006981282117 R-square on test set : 0.8558999137194322 RMSE on training set : 3.3036874114402477 RMSE on test set : 4.025176333238123
#plot graph of feature importances for better visualization
plt.figure(figsize = (12,8))
feat_importances = pd.Series(gradient_reg.feature_importances_, index=X.columns)
feat_importances.nlargest(20).plot(kind='barh')
plt.show()
Observations and insights: __
# Defining list of models you have trained
#models = [lr, olsmodel1, olsmodel2, dtree, ridge, dtree_tuned,clf, randomforest_tuned,knn,xgb,ada_regr,gradient_reg]
models = [lr,ridge,dtree,dtree_tuned, clf,randomforest_tuned,knn,xgb,xgb_tuned, ada_regr,gradient_reg]
# Defining empty lists to add train and test results
r2_train = []
r2_test = []
rmse_train = []
rmse_test = []
# Looping through all the models to get the rmse and r2 scores
for model in models:
# Accuracy score
j = get_model_score(model, False)
r2_train.append(j[0])
r2_test.append(j[1])
rmse_train.append(j[2])
rmse_test.append(j[3])
# We exclude OLS (R2 and Adjusted R2) and Lasso from the comparison as they are not good contenders for the model
# comparison_frame = pd.DataFrame({'Model':['Linear Regression','OLS - R2', 'OLS - AdjR2','Decision Tree', 'Ridge','Tuned Decision Tree','Tuned Random Forest','KNN','XGBoost','ADABoost', 'GradiantBoost'],
# 'Train_r2': r2_train,'Test_r2': r2_test,
# 'Train_RMSE': rmse_train,'Test_RMSE': rmse_test})
comparison_frame = pd.DataFrame({'Model':['Linear Regression','Ridge','Decision Tree', 'Tuned Decision Tree','Random Forest','Tuned Random Forest','KNN','XGBoost','XGBoost Tuned','ADABoost', 'GradiantBoost'],
'Train_r2': r2_train,'Test_r2': r2_test,
'Train_RMSE': rmse_train,'Test_RMSE': rmse_test})
comparison_frame
| Model | Train_r2 | Test_r2 | Train_RMSE | Test_RMSE | |
|---|---|---|---|---|---|
| 0 | Linear Regression | 0.963233 | 0.888446 | 2.185791 | 3.541570 |
| 1 | Ridge | 0.952489 | 0.912363 | 2.484690 | 3.139037 |
| 2 | Decision Tree | 0.999999 | 0.804616 | 0.010430 | 4.687024 |
| 3 | Tuned Decision Tree | 0.889225 | 0.786208 | 3.794007 | 4.902853 |
| 4 | Random Forest | 0.983413 | 0.875839 | 1.468100 | 3.736332 |
| 5 | Tuned Random Forest | 0.920943 | 0.835397 | 3.205151 | 4.302006 |
| 6 | KNN | 0.892492 | 0.786047 | 3.737639 | 4.904690 |
| 7 | XGBoost | 0.979437 | 0.903678 | 1.634643 | 3.290909 |
| 8 | XGBoost Tuned | 0.975357 | 0.900044 | 1.789470 | 3.352412 |
| 9 | ADABoost | 0.697459 | 0.668192 | 6.270028 | 6.107963 |
| 10 | GradiantBoost | 0.916007 | 0.855900 | 3.303687 | 4.025176 |
import matplotlib.pyplot as plt
plt.figure(figsize=(15,5))
# Set the width of the bar
barWidth = 0.4
# Set the position of the bars
bar1 = np.arange(len(comparison_frame))
bar2 = [x + barWidth for x in bar1]
# Create the bars for R2
colors = ['blue', 'red', 'green', 'yellow']
plt.bar(bar1, comparison_frame['Train_r2'], width=barWidth, edgecolor='black', label='Train R-squared', color=[0.68, 0.85, 0.90])
plt.bar(bar2, comparison_frame['Test_r2'], width=barWidth, edgecolor='black', label='Test R-squared', color=[1.00, 0.80, 0.60])
# Create the bars for RMSE
plt.bar(bar1, comparison_frame['Train_RMSE'], width=barWidth, edgecolor='black', label='Train RMSE', bottom=comparison_frame['Train_r2'])
plt.bar(bar2, comparison_frame['Test_RMSE'], width=barWidth, edgecolor='black', label='Test RMSE', bottom=comparison_frame['Test_r2'])
# Add axis labels and a title
plt.xlabel('Model')
plt.ylabel('Values')
plt.title('Comparison of R-squared and RMSE Values ')
# Set the x-axis tick labels
plt.xticks([r + barWidth/2 for r in range(len(comparison_frame))], comparison_frame['Model'], rotation=45)
# Create the legend
plt.legend()
# Show the plot
plt.show()
# Exclude OLS (both R2 and Adjusted R2) and Lasso from graphic as they are throwing off the visualization and are not good contenders for the model
Observations: _
We completly excluded 3 of the 15 models from the analysis since the results were pretty bad (OLS (both R2 and Adjusted R2) and Lasso). Almost all other the models had similar R2 or adjusted R2 values. The variation occured in the RMSE values. Our criteria was to endsure R2 values were closest to 1, and the RMSE values were close.
Note: You can also try some other algorithms such as KNN and compare the model performance with the existing ones.
REFINED INSIGHTS::
Based on the data, the most common factors affecting the price of a used car are Power, Year, Engine and Mileage. It is particularly interesting that Power appeared as the most important factor, except for in SKLearn linear regression. Based on previous domain knowledge, I would expect Year to be most significant, followed by Year.
COMPARISON OF TECHNIQUES AND THEIR RELATIVE PERFORMANCE:
R-squared is a measure of how well the model fits the data, with a value of 1 indicating a perfect fit. A higher R-squared value indicates that the model is a better fit for the data. RMSE measures the difference between the predicted and actual values and a lower value indicates a better fit. The following is an ordered ranking from worst to best of the models:
SCOPE FOR IMPROVEMENT:
Based on the given test R2 and test RMSE values for the XGBoost model, it appears that there is some scope for improvement. The XGBoost model has a test R2 of 0.903678 and a test RMSE of 3.290909. These metrics indicate that the model is not perfectly capturing the target variable, and there may be some room for improvement.
FURTHER IMPROVEMENT:
Feature selection: I could look at other methods and perform Recursive Feature Elimination to remove features that have a low impact on the model and can improve its performance and reduce overfitting. The algorithm works by removing the least important features based on the weights or coefficients of the model, and the process can be repeated until a desired number of features is reached.
Data augmentation: I could generate more data by using techniques such as random rotations, shifts, and flips to increase the size of my dataset and reduce overfitting.
Feature Engineering: I could look at improving the quality and relevance of the features. This could involve creating new features from existing ones, transforming features, or removing irrelevant or redundant features. For example, we could elminate Owner, which does not appear as a high runner feature.
Further hyperparameter tuning: XGBoost has several hyperparameters that can be tuned to improve its performance. Some of these include tuning the learning rate, number of trees, maximum depth of trees, and others to find the optimal values for the dataset. I tried a number of combinations, and they resulted in similar values as the non-tuned model. Tuning these feature was heavy computationally and time wise. If I had more time, I would try other combinations to determine the one that results in the best performance.
PROPOSAL FOR FINAL SOLUTION DESIGN:
FINAL SOLUTION DESIGN
About XGBoost:
XGBoost (eXtreme Gradient Boosting) is an open-source software library that provides a fast and efficient implementation of gradient boosting for machine learning. XGBoost, it is an ensemble method based on the gradient boosting algorithm, which is an iterative optimization process that adjusts the weights of the weak models so that the combined predictions minimize a loss function, such as mean squared error for regression. By training a sequence of weak models, where each subsequent model aims to correct the errors of the previous model, the final prediction is made by combining the predictions of all the individual models. In XGBoost, the individual models are decision trees which are created in sequential form. Weights are assigned to all the independent variables which are then fed into the decision tree which predicts results. The weight of variables predicted wrong by the tree is increased and these variables are then fed to the second decision tree. By increasing the weight of misclassified instances, subsequent trees in the ensemble put more emphasis on correctly classifying these instances, which helps to improve the overall accuracy of the model. This is a key aspect of the boosting technique used in XGBoost. These individual classifiers/predictors then ensemble to give a strong and more precise model. More generally, it can work on regression, classification, ranking, and user-defined prediction problems.
https://www.geeksforgeeks.org/xgboost/
Recommendation:
We conducted exploratory data analytics (EDA) analysis on the used_car dataset. We then created 13 linear regression models and evaluated their success outcomes (R2 and RMSE values). Our investigation revealed that XGBoost is the best way to predict the price of a used car based on the Data Dictionary provided for the following reasons:
Handling of numerical and categorical features: XGBoost handled both numerical and categorical features, making it well-suited for this problem as it handled the numerical features such as "Year", "Kilometers_driven", "Mileage", "Engine", "Power", and "New_Price", as well as categorical features such as "Location", "Fuel_Type", "Transmission", "Owner", and "Seats".
Model interpretability: XGBoost provided built-in feature importance scores and visualization tools, which helped to understand the relative importance of different features in determining the price of a used car. This made it easier to understand and communicate the results of my model, as well as to identify potential areas for improvement.
Non-linear relationships: XGBoost handled non-linear relationships between features and target outcomes, which is important as the relationship between features such as "Kilometers_driven", "Engine", and "Power" and the target "Price" are likely to be non-linear.
Handling of missing values: XGBoost can handle missing values, which may be present in the dataset, making it a versatile tool for regression tasks. While we handled missing values in Milestone 1, it is useful to know that this is the case for future data analysis.
Performance: XGBoost is known for its fast training speed and high prediction , which makes it well-suited for large datasets. Since I am not familier with how long it should take to run, I am citing this as a general advantage of XGBoost.
Scalability: From the literature, one of the key strengths of XGBoost is its scalability. It can handle datasets with millions of examples and thousands of features, making it a popular choice for working with big data. Additionally, XGBoost has a number of advanced features that make it a highly customizable and flexible tool, such as support for parallel processing, tree pruning, and weighting of examples. Scalability may become important as more data becomes available, and as the company correlates different databases to derive intelligent insights other than pricing. For example, Cars4U can use the existing data and add vehicle service contract information to help determine the correct price of the extended warranty.
Handling complexity: Ensemble methods are particularly useful when dealing with complex, non-linear relationships between features and target outcomes. By combining the predictions of multiple models, ensemble methods can often achieve higher accuracy and better generalization performance than individual models. Additionally, ensemble methods can also help to mitigate overfitting, which can be a problem when training models on large, complex datasets. Again, this will become even more important if the business want to use other databases to provide additional value and insights to their customers.
Ease of use: XGBoost automates the process of training individual decision trees and combining their predictions, so that you don't have to worry about manual tuning of model parameters or worry about overfitting. XGBoost also provides a number of hyperparameters that you can tune to control the size and complexity of the individual decision trees, as well as the number of trees and the learning rate used in the optimization process.
Open source: There are several benefits to using open source machine learning models. These advantages are true for all the models used in this milestone. We include them here for completeness as we are asked to substantiate our recommendation for XGBoost.
Cost Effective: Open source machine learning models are free to use, which can save a significant amount of money compared to proprietary software. a) Customizable: Since the source code is publicly available, users have the flexibility to modify and tweak the model to better suit their specific use case. b) Large Community: Open source machine learning models often have a large and active community of contributors, which can result in regular updates and bug fixes. c) Better Integration: Open source machine learning models can be easily integrated into other open source tools and technologies, leading to a more streamlined workflow. d) Transparency: The transparency of open source machine learning models allows users to understand how the model works, which can help build trust in the model's predictions.
Overall, XGBoost can be a good choice for determining the price of a used car based on the above data dictionary as it can handle a variety of data types, provide insights into the relative importance of features, handle non-linear relationships, and achieve high prediction accuracy.